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Abstract 

The probabihty distribution P from which the history of our universe is sampled 
represents a theory of everything or TOE. We assume P is formally describable. Since 
most (uncountably many) distributions are not, this imposes a strong inductive bias. 
We show that P{x) is small for any universe x lacking a short description, and study the 
spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive 
descriptions, the other the fastest way of computing everything. The former derives 
from generalizations of traditional computability, Solomonoff 's algorithmic probability, 
Kolmogorov complexity, and objects more random than Chaitin's Omega, the latter 
from Levin's universal search and a natural resource-oriented postulate: the cumula- 
tive prior probability of all x incomputable within time t by this optimal algorithm 
should be 1/t. Between both Ps we find a universal cumulatively enumerable mea- 
sure that dominates traditional enumerable measures; any such GEM must assign low 
probability to any universe lacking a short enumerating program. We derive P-specific 
consequences for evolving observers, inductive reasoning, quantum physics, philosophy, 
and the expected duration of our universe. 



10 theorems, 50 pages, 100 references, 20000 words 



Keywords: formal describability, constructive mathematics, randomness, pseudorandom- 
ness, minimal description length, generalized Kolmogorov complexity, complexity hierarchy, 
algorithmic probability, halting probability Omega, convergence probability, semimeasures, 
cumulatively enumerable measures, universal priors, speed prior, universal search, inductive 
inference, Occam's razor, computable universes, theory of everything, collapse of the wave 
function, many worlds interpretation of quantum mechanics, countable vs uncountable. 



Note: This is a slightly revised version of a recent preprint / f?^/ . The essential results should 
be of interest from a purely theoretical point of view independent of the motivation through 
formally describable universes. To get to the meat of the paper, skip the introduction and go 
immediately to Subsection which provides a condensed outline of the main theorems. 
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1 Introduction to Describable Universes 

An object X is formally describable if a finite amount of information completely describes 
X and only X. More to the point, X should be representable by a possibly infinite bitstring 
X such that there is a finite, possibly never halting program p that computes x and nothing 
but a; in a way that modifies each output bit at most finitely many times; that is, each finite 
beginning of x eventually converges and ceases to change. Definitions |2.1[ - |2.5| will make this 



precise, and Sections will clarify that this constructive notion of formal describability is 
less restrictive than the traditional notion of computability mainly because we do not 



insist on the existence of a halting program that computes an upper bound of the convergence 



time of p's n-th output bit. Formal describability thus pushes constructivism ||T7| , |g] to the 
extreme, barely avoiding the nonconstructivism embodied by even less restrictive concepts of 
describability (compare computability in the limit |^, ^ |3^ and A°-describability |^ |5^, 



p. 46-47]). The results in Sections will exploit the additional degrees of freedom gained 
over traditional computability, while Section |^ will focus on another extreme, namely, the 
fastest way of computing all computable objects. 

Among the formally describable things are the contents of all books ever written, all 
proofs of all theorems, the infinite decimal expansion of \/T7, and the enumerable "number 
of wisdom" Q ||2^, ^ |21|, ^ . Most real numbers, however, are not individually describable, 
because there are only countably many finite descriptions, yet uncountably many reals, as 
observed by Cantor in 1873 It is easy though to write a never halting program that 
computes all finite prefixes of all real numbers. In this sense certain sets seem describable 
while most of their elements are not. 
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What about our universe, or more precisely, its entire past and future history? Is it 
individually describable by a finite sequence of bits, just like a movie stored on a compact 
disc, or a never ending evolution of a virtual reality determined by a finite algorithm? If so, 
then it is very special in a certain sense, just like the comparatively few describable reals are 
special. 



Example 1.1 (Pseudorandom universe) Let x be an infinite sequence of finite bitstrings 
x^, x^, . . . representing the history of some discrete universe, where x*^ represents the state of 
the universe at discrete time step k, and x^ the "Big Bang" (compare |]T2|). Suppose there is 



a finite algorithm A that computes x'^"^^ {k > 1) from x'' and additional information noise^ 
(this may require numerous computational steps of A, that is, "local" time of the universe 
may run comparatively slowly). Assume that noise'' is not truly random but calculated by 
invoking a finite pseudorandom generator subroutine [§. Then x is describable because it 
has a finite constructive description. 



Contrary to a widely spread misunderstanding, quantum physics, quantum computation 
(e.g., P, |6^) and Heisenberg's uncertainty principle do not rule out that our own uni- 
verse's history is of the type exemplified above. It might be computable by a discrete process 
approximated by Schrodinger's continuous wave function, where noise'' determines the "col- 
lapses" of the wave function. Since we prefer simple, formally describable explanations over 
complex, nondescribable ones, we assume the history of our universe has a finite description 
indeed. 

This assumption has dramatic consequences. For instance, because we know that our 
future lies among the few (countably many) describable futures, we can ignore uncountably 
many nondescribable ones. Can we also make more specific predictions? Does it make sense 
to say some describable futures are necessarily more likely than others? To answer such 
questions we will examine possible probability distributions on possible futures, assuming 
that not only the histories themselves but also their probabilities are formally describable. 
Since most (uncountably many) real- valued probabilities are not, this assumption — against 
which there is no physical evidence — actually represents a major inductive bias, which 
turns out to be strong enough to explain certain hitherto unexplained aspects of our world. 



Example 1.2 (In which universe am I?) Let h{y) represent a property of any possibly 
infinite bitstring y, say, h{y) = 1 if y represents the history of a universe inhabited by a 
particular observer (say, yourself) and h{y) = otherwise. According to the weak anthropic 
principle |24, |], the conditional probability of finding yourself in a universe compatible with 
your existence equals 1. But there may be many y's satisfying h{y) = 1. What is the 
probability that y = x, where x is a particular universe satisfying h{x) = 1? According to 
Bayes, 

P[x = y\ h[y) = 1) = — oc P[x) (1) 

where P{A \ B) denotes the probability of A, given knowledge of B, and the denominator is 
just a normalizing constant. So the probability of finding yourself in universe x is essentially 
determined by P{x), the prior probability of x. 
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Each prior P stands for a particular "theory of everything" or TOE. Once we know 
something about P we can start making informed predictions. Parts of this paper deal with 
the question: what are plausible properties of P? One very plausible assumption is that P is 
approximable for all finite prefixes x of x in the following sense. There exists a possibly never 
halting computer which outputs a sequence of numbers T{t,x) at discrete times t = 1, 2, . . . 
in response to input x such that for each real e > there exists a finite time to such that for 
all t > to: 

\ P{x) -T{t,x) \< e. (2) 



Approximability in this sense is essentially equivalent to formal describability (Lemma |2.1| 
will make this more precise). We will show (Section ^ that the mild assumption above adds 
enormous predictive power to the weak anthropic principle: it makes universes describable 
by short algorithms immensely more likely than others. Any particular universe evolution 
is highly unlikely if it is determined not only by simple physical laws but also by additional 
truly random or noisy events. To a certain extent, this will justify "Occam's razor" (e.g., 
[pj]] ) which expresses the ancient preference of simple solutions over complex ones, and which 
is widely accepted not only in physics and other inductive sciences, but even in the fine arts 

All of this will require an extension of earlier work on Solomonoff 's algorithmic probabil- 
ity, universal priors, Kolmogorov complexity (or algorithmic information), and their refine- 
ments ^ ^, |TUD|, §2[ |27|, ^, [73, ^ |, ^, 0. We will prove several 
theorems concerning approximable and enumerable objects and probabilities (Sections 
see outline below). These theorems shed light on the structure of all formally describable 
objects and extend traditional computability theory; hence they should also be of interest 
without motivation through describable universes. 

The calculation of the subjects of these theorems, however, may occasionally require 
excessive time, itself often not even computable in the classic sense. This will eventually 
motivate a shift of focus on the temporal complexity of "computing everything" (Section 
^). If you were to sit down and write a program that computes all possible universes, which 
would be the best way of doing so? Somewhat surprisingly, a modification of Levin Search 
can simultaneously compute all computable universes in an interleaving fashion that 
outputs each individual universe as quickly as its fastest algorithm running just by itself, 
save for a constant factor independent of the universe's size. This suggests a more restricted 
TOE that singles out those infinite universes computable with countable time and space 
resources, and a natural resource-based prior measure S on them. Given this "speed prior" 
S, we will show that the most likely continuation of a given observed history is computable 
by a fast and short algorithm (Section |H?^). 

The S'-based TOE will provoke quite specific prophecies concerning our own universe 
(Section |7.5| ). For instance, the probability that it will last 2" times longer than it has lasted 
so far is at most 2~". Furthermore, all apparently random events, such as beta decay or 
collapses of Schrodinger's wave function of the universe, actually must exhibit yet unknown, 
possibly nonlocal, regular patterns reflecting subroutines (e.g., pseudorandom generators) of 
our universe's algorithm that are not only short but also fast. 
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1.1 Outline of Main Results 



Some of the novel results herein may be of interest to theoretical computer scientists and 
mathematicians (Sections some to researchers in the fields of machine learning and 

inductive inference (the science of making predictions based on observations, e.g., §[-0), some 
to physicists (e.g., ||-|D, some to philosophers (e.g., ^H). Sections ^-^ might help those 
usually uninterested in technical details to decide whether they would also like to delve into 
the more formal Sections In what follows, we summarize the main contributions and 
provide pointers to the most important theorems. 

Section ^ introduces universal Turing Machines (TMs) more general than those considered 
in previous related work: unlike traditional TMs, General TMs or GTMs may edit their 
previous outputs (compare inductive TMs |T^), and Enumerable Output Machines (EOMs) 
may do this provided the output does not decrease lexicographically. We will define: a 
formally describable object x has a finite, never halting GTM program that computes x 
such that each output bit is revised at most finitely many times; that is, each finite prefix 
of X eventually stabilizes (Defs. |2.1| - |2.5| ); describable functions can be implemented by such 
programs (Def. p.lOp ; weakly decidable problems have solutions computable by never halting 
programs whose output is wrong for at most finitely many steps (Def. |2.11|) . Theorem |2T 
generalizes the halting problem by demonstrating that it is not weakly decidable whether 
a finite string is a description of a describable object (compare a related result for analytic 
TMs by Hotz, Vierke and Schieffer ||]). 

Section ^ generalizes the traditional concept of Kolmogorov complexity or algorithmic 
information [^0|, of finite x (the length of the shortest halting program computing 

x) to the case of objects describable by nonhalting programs on EOMs and GTMs (Defs. 
|3.2| - |3.4| ). It is shown that the generalization for EOMs is describable, but the one for GTMs 
is not (Theorem p. 11 ). Certain objects are much more compactly encodable on EOMs than 
on traditional monotone TMs, and Theorem ^]3| shows that there are also objects with short 
GTM descriptions yet incompressible on EOMs and therefore "more random" than Chaitin's 
Q [^, the halting probability of a TM with random input, which is incompressible only on 
monotone TMs. This yields a natural TM type-specific complexity hierarchy expressed by 
Inequality (p!^. 

Section ^ discusses probability distributions on describable objects as well as the non- 
describable convergence probability of a GTM (Def. [4.141 ). It also introduces describable 
(semi) measures as well as cumulatively enumerable measures (CEMs, Def. [4.5|) , where 
the cumulative probability of all strings lexicographically greater than a given string x is 
EOM-computable or enumerable. Theorem [4.1| shows that there is a universal GEM that 
dominates all other GEMs, in the sense that it assigns higher probability to any finite y, 
save for a constant factor independent of y. This probability is shown to be equivalent 
to the probability that an EOM whose input bits are chosen randomly produces an out- 
put starting with y (GoroUary and Lemma (4.21 ). The nonenumerable universal GEM 
also dominates enumerable priors studied in previous work by Solomonoff, Levin and oth- 
ers |TDg, §5 123, |§ ^ |S|, ^ §§. Theorem shows that there is no universal 
approximable measure (proof by M. Hutter). 

Section ^ establishes relationships between generalized Kolmogorov complexity and gen- 
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eralized algorithmic probability, extending previous work on enumerable semimeasures by 



Levin, Gacs, and others ||100| , p^ , p5| , |2j, |3^, . For instance. Theorem ^]3| shows that the 
universal GEM assigns a probability to each enumerable object proportional to ^ raised to 
the power of the length of its minimal EOM-based description, times a small corrective fac- 
tor. Similarly, objects with approximable probabilities yet without very short descriptions 
on GTMs are necessarily very unlikely a priori (Theorems |5.4| and |5.5|) . Additional suspected 
links between generalized Kolmogorov complexity and probability are expressed in form of 



Gonjectures [5Tl| - |573 . 

Section ^ addresses issues of temporal complexity ignored in the previous sections on 
describable universe histories (whose computation may require excessive time without re- 
cursive bounds). In Subsection |6.2| , Levin's universal search algorithm |5^, |55| (which takes 
into account program runtime in an optimal fashion) is modified to obtain the fastest way of 
computing all "S-describable" universes computable within countable time (Def. |6.1| , Section 
|6.3| ); uncountably many other universes are ignored because they do not even exist from a 



constructive point of view. Postulate |0| then introduces a natural resource-oriented bias 
reflecting constraints of whoever calculated our universe (possibly as a by-product of a search 
for something else): we assign to universes prior probabilities inversely proportional to the 
time and space resources consumed by the most efficient way of computing them. Given the 
resulting "speed prior 5"' (Def. |6.5D and past observations x, Theorem ^TI| and Gorollary |6.1| 
demonstrate that the best way of predicting a future y is to minimize the Levin complexity 
of {x,y). 

Section |^ puts into perspective the algorithmic priors (recursive and enumerable) intro- 



duced in previous work on inductive inference by Solomonoff and others [p2| , |83| , pq , ^ 
as well as the novel priors discussed in the present paper (cumulatively enumerable, ap- 
proximable, resource-optimal). Gollectively they yield an entire spectrum of algorithmic 
TOEs. We evaluate the plausibility of each prior being the one from which our own uni- 
verse is sampled, discuss its connection to "Occam's razor" as well as certain physical and 
philosophical consequences, argue that the resource-optimal speed prior S may be the most 
plausible one (Section |7.4| ), analyze the inference problem from the point of view of an 
observer [0, ^ |8^, ^ evolving in a universe sampled from S, make appropriate 
predictions for our own universe (Section [775|), and discuss their falsifiability. 



2 Preliminaries 
2.1 Notation 

Much but not all of the notation used here is similar or identical to the one used in the 
standard textbook on Kolmogorov complexity by Li and Vitanyi 

Since sentences over any finite alphabet are encodable as bitstrings, without loss of gen- 
erality we focus on the binary alphabet B = {0, 1}. A denotes the empty string, B* the set of 
finite sequences over B, B°° the set of infinite sequences over 5, B^ = B*VJ B°° . x, y, z, z^, z"^ 
stand for strings in BK If x G B* then xy is the concatenation of x and y (e.g., if x = 10000 
and y = 1111 then xy = 100001111). Let us order B^ lexicographically: if x precedes y alpha- 
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betically (like in the example above) then we write x -< y or y y x; if x may also equal y then 
we write x ^ y or y ^ x (e.g., A -< 001 -< 010 -< 1 -< 1111...). The context will make clear 
where we also identify x (z B* with a unique nonnegative integer Ix (e.g., string 0100 is rep- 
resented by integer 10100 in the dyadic system or 20 = 1 * 2"^ + * 2^ + 1 * 2^ + * 2^ + * 2° 
in the decimal system). Indices i, j,m,mQ,mi,n,nQ,t,to range over the positive integers, 
constants c, Cq, Ci over the positive reals, /, g denote functions mapping integers to inte- 
gers, log the logarithm with basis 2, lg{r) = maxk{integer k : 2'' < r} for real r > 0. 
For X G i?*\{A}, 0.x stands for the real number with dyadic expansion x (note that 
O.xOlll.... = O.xl = O.xlO = O.xlOO... for x G B*, although xOlll.... ^ xl ^ xlO ^ xlOO...). 
For X e B*, l{x) denotes the number of bits in x, where l{x) = oo for x G B°°; /(A) = 0. x„ 
is the prefix of x consisting of the first n bits, if l{x) > n, and x otherwise {xq := A). For 
those X G -B* that contain at least one 0-bit, x' denotes the lexicographically smallest y >~ x 
satisfying l{y) < l{x) {x' is undefined for x of the form 111 ... 1). We write f{n) = 0{g{n)) 
if there exists c, no such that f{n) < cg{n) for all n > uq. 

2.2 Turing Machines: Monotone TMs (MTMs), General TMs 
(GTMs), Enumerable Output Machines (EOMs) 

The standard model of theoretical computer science is the Turing Machine (TM). It allows 
for emulating any known computer. For technical reasons we will consider several types of 
TMs. 

Monotone TMs (MTMs). Most current theory of description size and inductive 
inference is based on MTMs (compare p. 276 ff]) with several tapes, each tape being a 
finite chain of adjacent squares with a scanning head initially pointing to the leftmost square. 
There is one output tape and at least two work tapes (sufficient to compute everything 
traditionally regarded as computable). The MTM has a finite number of internal states, 
one of them being the initial state. MTM behavior is specified by a lookup table mapping 
current state and contents of the squares above work tape scanning heads to a new state and 
an instruction to be executed next. There are instructions for shifting work tape scanning 
heads one square left or right (appending new squares when necessary), and for writing or 
1 on squares above work tape scanning heads. The only input-related instruction requests an 
input bit determined by an external process and copies it onto the square above the first work 
tape scanning head. There may or may not be a halt instruction to terminate a computation. 
Sequences of requested input bits are called self-delimiting programs because they convey all 
information about their own length, possibly causing the MTM to halt |]5^, or at 

least to cease requesting new input bits (the typical case in this paper). MTMs are called 
monotone because they have a one-way write-only output tape — they cannot edit their 
previous output, because the only ouput instructions are: append a new square at the right 
end of the output tape and fill it with 0/1. 

General TMs (GTMs). GTMs are like MTMs but have additional output instructions 
to edit their previous output. Our motivation for introducing GTMs is that certain bitstrings 
are compactly describable on nonhalting GTMs but not on MTMs, as will be seen later. This 
has consequences for definitions of individual describability and probability distributions on 
describable things. The additional instructions are: (a) shift output scanning head right/left 
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(but not out of bounds); (b) delete square at the right end of the output tape (if it is not the 
initial square or above the scanning head); (c) write 1 or on square above output scanning 
head. Compare Burgin's inductive TMs and super-recursive algorithms |T^, [T^ . 



Enumerable Output Machines (EOMs). Like GTMs, EOMs can edit their previous 
output, but not such that it decreases lexicographically. The expressive power of EOMs lies in 
between those of MTMs and GTMs, with interesting computability-related properties whose 
analogues do not hold for GTMs. EOMs are like MTMs, except that the only permitted 
output instruction sequences are: (a) shift output tape scanning head left/right unless this 
leads out of bounds; (b) replace bitstring starting above the output scanning head by the 
string to the right of the scanning head of the second work tape, readjusting output tape 
size accordingly, but only if this lexicographically increases the contents of the output tape. 
The necessary test can be hardwired into the finite TM transition table. 



2.3 Infinite Computations, Convergence, Formal Describability 

Most traditional computability theory focuses on properties of halting programs. Given an 
MTM or EOM or GTM T with halt instruction and p,x E B* , we write 

Tip) = X (3) 

for "p computes x on T and halts" . Much of this paper, however, deals with programs that 
never halt, and with TMs that do not need halt instructions. 



Definition 2.1 (Convergence) Let p G B'^ denote the input string or program read by TM 
T. LetTt{p) denote T's finite output string after t instructions. We say that p andp's output 
stabilize and converge towards x E B^ iff for each n satisfying < n < l{x) there exists a 
postive integer t^ such that for all t > tn: Tt{p)n = Xn and l{Tt{p)) < l{x). Then we write 

T{p) X. (4) 



Although each beginning or prefix of x eventually becomes stable during the possibly infi- 
nite computation, there need not be a halting program that computes an upper bound of 
stabilization time, given any p and prefix size. Compare the concept of computability in the 



limit 39, 35, 341 and \il 



Definition 2.2 (TM-Specific Individual Describability) Given a TM T, an x E is 

T-describable or T-computable iff there is a finite p E B* such that T{p) ^ x. 



Objects with infinite shortest descriptions on T are not T-describable. 



Definition 2.3 (Universal TMs) Let C denote a set of TMs. C has a universal element 
if there is a TM E C such that for each T E C there exists a constant string pt E B* 
(the compiler^ such that for all possible programs p, if T{p) ^ x then U^{ptp) ^ x. 
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Definition 2.4 (M, E, G) Let M denote the set of MTMs, E denote the set of EOMs, G 
denote the set of GTMs. 

M, E, G all have universal elements, according to the fundamental compiler theorem (for 
instance, a fixed compiler can translate arbitrary LISP programs into equivalent FORTRAN 
programs) . 

Definition 2.5 (Individual Describability) Let G denote a set of TMs with universal 
element U'" . Some x E B'^ is C-describable or C-computable if it is U'" -descrihahle. E- 
describable strings are called enumerable. G-describable strings are called formally describ- 
able or simply describable. 



Example 2.1 (Pseudorandom universe based on halting problem) Let x be a uni- 
verse history in the style of Example |L1| . Suppose its pseudorandom generator's n-th output 
bit PRG{n) is 1 if the n-th program of an ordered list of all possible programs halts, and 
otherwise. Since PRG{n) is describable, x is too. But there is no halting algorithm com- 
puting PRG{n) for all n, otherwise the halting problem would be solvable, which it is not 
^% . Hence in general there is no computer that outputs x and only x without ever editing 



some previously computed history. 



Definition 2.6 (Always converging TMs) TM T always converges if for all of its pos- 
sible programs p E there is an x E B^ such that T{p) ^ x. 

For example, MTMs and EOMs converge always. GTMs do not. 

Definition 2.7 (Approximability) Let O.x denote a real number, x G -B*\{A}. O.x is 
approximable by TM T if there is a p E B* such that for each real e > there exists a to 
such that 

I 0.x-O.Tt{p) \< e 

for all times t > to- 0.x is approximable if there is at least one GTM T as above — compare 



Lemma 2.1 If 0.x is approximable, then x is describable, and vice versa. 



2.4 Formally Describable Functions 

Much of the traditional theory of computable functions focuses on halting programs that 
map subsets of B* to subsets of B*. The output of a program that does not halt is usually 
regarded as undefined, which is occasionally expressed by notation such as T{p) = oo. In this 
paper, however, we will not lump together all the possible outputs of nonhalting programs 
onto a single symbol "undefined." Instead we will consider mappings from subsets of B* to 
subsets of B^, sometimes from B^ to BK 
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Definition 2.8 (Encoding B*) Encode x E B* as a self-delimiting input p{x) for an ap- 
propriate TM, using 

l(p(x)) = l{x) + 2log l{x) + 0(1) (5) 

bits as follows: write l{x) in binary notation, insert a "0" after every "0" and a "1" after 
every "1," append "01" to indicate the end of the description of the size of the following 
string, then append x. 

For instance, x = 01101 gets encoded as p{x) = 1100110101101. 

Definition 2.9 (Recursive Functions) A function h : Di G B* ^ D2 G B* is recursive 
if there is a TM T using the encoding \2. Sj such that for all x G Di : T{p{x)) = h{x). 



Definition 2.10 (Describable Functions) LetT denote a TM using the encoding of Def. 
17^ . A function h : Di G B* ^ D2 G B^ is T-describable if for all x G Di : T{p{x) ) ^ h{x) 



Let C denote a set of TMs using encoding \2. <§| , with universal element . h is C-describable 
or C-computable if it is ^/''-computable. // the T above is universal among the GTMs with 
such input encoding (see Def. ^.dj ) then h is describable. 



Compare functions in the arithmetic hierarchy and the concept of A°-describability, e.g. 
[0, p. 46-47]. 



2.5 Weak Decidability and Convergence Problem 



Traditionally, decidability of some problem class implies there is a halting algorithm that 
prints out the answer, given a problem from the class. We now relax the notion of decidability 
by allowing for infinite computations on EOMs or GTMs whose answers converge after 
finite yet possibly unpredictable time. Essentially, an answer needs to be correct for almost 
all the time, and may be incorrect for at most finitely many initial time steps (compare 
computability in the limit ET], PUI, pH P3| and super- recursive algorithms [ITR |T3). 



Definition 2.11 (Weak decidability) Consider a characteristic function h : Di G B* ^ 

B: h{x) = 1 if X satisfies a certain property, and h{x) = otherwise. The problem of 
deciding whether or not some x G Di satisfies that property is weakly decidable if h{x) is 
describable (compare Def. \2.1(\ ). 



Example 2.2 Is a given string p G B* a. halting program for a given MTM? The problem 
is not decidable in the traditional sense (no halting algorithm solves the general halting 



problem [^), but weakly decidable and even E-decidable, by a trivial algorithm: print "0" 
on first output square; simulate the MTM on work tapes and apply it to p, once it halts 
after having read no more than l{p) bits print "1" on first output square. 
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Example 2.3 It is weakly decidable whether a finite bitstring p is a program for a given 
TM. Algorithm: print "0"; feed p bitwise into the internally simulated TM whenever it 
requests a new input bit; once the TM has requested l{p) bits, print "1"; if it requests an 
additional bit, print "0". After finite time the output will stabilize forever. 



Theorem 2.1 (Convergence Problem) Given a GTM, it is not weakly decidable whether 
a finite bitstring is a converging program, or whether some of the output bits will fluctuate 
forever. 



Proof. A proof conceptually quite similar to the one below was given by Hotz, Vierke 
and Schieffer [^] in the context of analytic TMs derived from R-Machines |Ty] (the 
alphabet of analytic TMs is real-valued instead of binary). Version 1.0 of this paper was 
written without awareness of this work. Nevertheless, the proof in Version 1.0 is repeated 
here because it does serve illustrative purposes. 

In a straightforward manner we adapt Turing's proof of the undecidability of the MTM 



halting problem [^], a reformulation of Godel's celebrated result [^, using the diagonaliza- 
tion trick whose roots date back to Cantor's proof that one cannot count the real numbers 
p^ . Let us write T(x) | if there is a 2 G -B^ such that T{x) z. Let us write T{x) | if T's 
output fiuctuates forever in response to x (e.g., by flipping from 1 to zero and back forever). 
Let Ai, A2, ... be an effective enumeration of all GTMs. Uniquely encode all pairs of finite 
strings {x,y) in B* x B* as finite strings code{x,y) G B*. Suppose there were a GTM U 
such that (*): for all x,y E B* : U{code{x,y)) 1 if A^{y) |, and U{code{x,y)) ^ 
otherwise. Then one could construct a GTM T with T{x) 1 if U{code{x,x)) ^ 0, and 
T(x) I otherwise. Let y be the index of T = Ay, then Ay{y) J, if f/ {code{y, y)) ^ 0, otherwise 
Ayiy) t By (*), however, U{code{y,y)) ^ 1 if Ay{y) i, and U{code{y,y)) if Ay{y) |. 
Contradiction. □ 



3 Complexity of Constructive Descriptions 



Throughout this paper we focus on TMs with self-delimiting programs ||52|, ^ . Tra- 

ditionally, the Kolmogorov complexity |^U|, ^ or algorithmic complexity or algorithmic 



information of a; G -B* is the length of the shortest halting program computing x: 



Definition 3.1 (Kolmogorov Complexity K) Fix a universal MTM or EOM or GTM 

U with halt instruction, and define 

K{x) = niin{/(p) : U{p) = x}. (6) 



Let us now extend this to nonhalting GTMs. 
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3.1 Generalized Kolmogorov Complexity for EOMs and GTMs 



Definition 3.2 (Generalized K^) Given any TM T, define 

Kt{x) = min{/(j9) : T{p) ^ x} 



Compare Schnorr's "process complexity" for MTMs fT^, ^| 



Definition 3.3 {K^^ ^ K^, based on Invariance Theorem) Consider Def. Let 



C denote a set of TMs with universal TM (T E C). We drop the index T, writing 

K^{x) = Kuc{x) < Kt{x) + 0(1). 



This is justified by an appropriate Invariance Theorem ||50| , p2| , pG] : there is a positive 
constant c such that Kjjc{x) < Kt{x) + c for all x, since the size of the compiler that 
translates arbitrary programs for T into equivalent programs for U'" does not depend on x. 

Definition 3.4 {KniT, Km^^ , Km^ , Km'^) Given TM T and x G B* , define 

KmT{x) = min{/(p) : T{p) ^ xy,y E B^}. (7) 

Consider Def . \2.4\ - If C denotes a set of TMs with universal TM , then define Km^{x) = 
Kmifc{x). 



KmP is a generalization of Schnorr's ||7^ and Levin's complexity measure Km^ for 
MTMs. 



Describability issues. K(x) is not computable by a halting program ^ but 
obviously G-computable or describable; the z with O.z = is even enumerable. Even 
K^{x) is describable, using the following algorithm: 



Run all EOM programs in "dovetail style" such that the n-th step of the i-th 
program is executed in the n + i-th phase {i = 1,2, . . .); whenever a program 
outputs X, place it (or its prefix read so far) in a tentative list L of x-computing 
programs or program prefixes; whenever an element of L produces output >- x, 
delete it from L; whenever an element of L requests an additional input bit, 
update L accordingly. After every change of L replace the current estimate of 
K^{x) by the length of the shortest element of L. This estimate will eventually 
stabilize forever. 



Theorem 3.1 K^{x) is not describable. 
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Proof. Identify finite bitstrings with the integers they represent. If K'^{x) were describable 
then also 

h{x) = maXy{K^iy) : I < y < gix)}, (8) 
where g is any fixed recursive function, and also 

fix)=rmny{y:K''iy) = h{x)}. (9) 

Since the number of descriptions p with l{p) < n ~ 0(1) cannot exceed 2""'^'^^^ but the 
number of strings x with l{x) = n equals 2", most x cannot be compressed by more than 
0(1) bits; that is, K^{x) > log x— 0(1) for most x. From (^ we therefore obtain K'^{f{x)) > 
log g{x) — 0{l) for large enough x, because f{x) picks out one of the incompressible y < g{x). 
However, obviously we also would have K^{f{x)) < l{x)+2log I{x)+0{1), using the encoding 
of Def. p.8| . Contradiction for quickly growing g with low complexity, such as g{x) = 2^"". □ 

3.2 Expressiveness of EOMs and GTMs 

On their internal work tapes MTMs can compute whatever GTMs can compute. But they 
commit themselves forever once they print out some bit. They are ill-suited to the case where 
the output may require subsequent revision after time intervals unpredictable in advance 
— compare Example Alternative MTMs that print out sequences of result updates 
(separated by, say, commas) would compute other things besides the result, and hence not 
satisfy the "don't compute anything else" aspect of individual describability. Recall from 
the introduction that in a certain sense there are uncountably many collectively describable 
strings, but only countably many individually describable ones. 

Since GTMs may occasionally rewrite parts of their output, they are computationally 
more expressive than MTMs in the sense that they permit much more compact descriptions 
of certain objects. For instance, K[x) — K^{x) is unbounded, as will be seen next. This will 
later have consequences for predictions, given certain observations. 



Theorem 3.2 K{x) — K^{x) is unbounded. 



Proof. Define 



h'{x) = maXy{K{y) : 1 < y < gix)}; f'{x) = miny{y : K{y) = h'{x)}, (10) 

where g is recursive. Then K^{f'{x)) = 0{l{x) + K{g)) (where K{g) is the size of the 
minimal halting description of function g), but K{f'{x)) > log g{x) — 0(1) for sufficiently 
large x — compare the proof of Theorem |3.1| . Therefore K{f'{x)) — K'~^{f'{x)) > 0{log g{x)) 
for infinitely many x and quickly growing g with low complexity. □ 



3.2.1 EOMs More Expressive Than MTMs 



Similarly, some x are compactly describable on EOMs but not on MTMs. To see this, 
consider Chaitin's Q, the halting probability of an MTM whose input bits are obtained by 
tossing an unbiased coin whenever it requests a new bit |^ . Q is enumerable (dovetail over 
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all programs p and sum up the contributions 2"'^^^ of the halting p), but there is no recursive 
upper bound on the number of instructions required to compute given n. This implies 
K{Qn) = n + 0(1) and also K^^{Qn) = n + 0(1). It is easy to see, however, that on 
nonhalting EOMs there are much more compact descriptions: 

K^i^n) < 0{K{n)) < Oilog n)- (11) 

that is, there is no upper bound of 

K^\^n)-K''{n^). (12) 

3.2.2 GTMs More Expressive Than EOMs — Objects Less Regular Than 

We will now show that there are describable strings that have a short GTM description yet 
are "even more random" than Chaitin's Omegas, in the sense that even on EOMs they do 
not have any compact descriptions. 

Theorem 3.3 For all n there are z E B* with 

K^{z) > n-0{l), yet K^{z) < 0{log n). 
That is, K^{z) — K'^{z) is unbounded. 

Proof. For x G -B*\{A} and universal EOM T define 

2(x)= E E 2-'(^^- (13) 

y£B'i:0.y>0.x p:T(p)'^y 

First note that the dyadic expansion of S(x) is EOM-computable or enumerable. The algo- 
rithm works as follows: 

Algorithm A: Initialize the real- valued variable V by 0, run all possible programs 
of EOM T dovetail style such that the n-th step of the i-ih program is executed 
in the n + i-th phase; whenever the output of a program prefix q starts with some 
y satisfying O.y > O.x for the first time, set V := V + 2~''^''^; henceforth ignore 
continuations of q. 

V approximates S(x) from below in enumerable fashion — infinite p are not worrisome as 
T must only read a finite prefix of p to observe O.y > 0.x if the latter holds indeed. We 
will now show that knowledge of H(a;)„, the first n bits of S(x), allows for constructing a 
bitstring z with K^{z) > n — 0(1) when x has low complexity. 

Suppose we know S(x)„. Once algorithm A above yields V > S(x)„ we know that no 
programs p with l{p) < n will contribute any more to V. Choose the shortest z satisfying 
O.z = {O.ymin — 0-x)/2, where ymin is the lexicographically smallest y previously computed 
by algorithm A such that O.y > 0.x. Then z cannot be among the strings T-describable with 
fewer than n bits. Using the Invariance Theorem (compare Def. |3.3| ) we obtain K^{z) > 
n-O(l). 
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While prefixes of are greatly compressible on EOMs, z is not. On the other hand, 
z is compactly G-describable: K'~^{z) < K{x) + K{n) + 0(1). For instance, choosing a 
low-complexity x, we have K^{z) < 0{K(n)) < 0{log n). □ 

The discussion above reveils a natural complexity hierarchy. Ignoring additive constants, we 
have 

K^{x) < K^{x) < K^\x), (14) 
where for each "<" relation above there are x which allow for replacing "<" by "<." 



4 Measures and Probability Distributions 

Suppose X represents the history of our universe up until now. What is its most likely 
continuation y E Bayes' theorem yields 

p/ I N P(x I xy)P{xy) Pjxy) 

where P{z'^ \ z^) is the probability of z"^, given knowledge of z^, and 

N{x) = J2 (16) 
zeB» 

is a normalizing factor. The most likely continuation y is determined by P{xy), the prior 
probability oi xy — compare the similar Equation (|I]). Now what are the formally describable 
ways of assigning prior probabilities to universes? In what follows we will first consider 
describable semimeasures on B*, then probability distributions on BK 

4.1 Dominant and Universal (Semi) Measures 

The next three definitions concerning semimeasures on B* are almost but not quite identical 
to those of discrete semimeasures p. 245 S] and continuous semimeasures p. 272 ff] 



based on the work of Levin and Zvonkin ||100|] 



Definition 4.1 (Semimeasures) A (binary) semimeasure fi is a function B* [0, 1] that 
satisfies: 

/i(A) = 1; /i(a;) > 0; ii{x) = /i(xO) + /i(a;l) + fi{x), (17) 
where jl is a function B* [0, 1] satisfying < /i(x) < ^{x). 



A notational difference to the approach of Levin ||100|| (who writes /i(x) < fi{xO) + ;u(xl)) is 



the explicit introduction of p,. Compare the introduction of an undefined element u by Li 
and Vitanyi |5^, p. 281]. Note that J^x^b* fii^) ^ 1- Later we will discuss the interesting 
case p,{x) = P{x), the a priori probability of x. 
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Definition 4.2 (Dominant Semimeasures) A semimeasure fio dominates another semimea- 
sure n if there is a constant such that for all x & B* 



Definition 4.3 (Universal Semimeasures) Let M. he a set of semimeasures on B* . A 
semimeasure fiQ E Ai is universal if it dominates all fi E A4. 

In what follows, we will introduce describable semimeasures dominating those considered 
in previous work (|00l, p|, p. 245 ff, p.272 ff]). 



4.2 Universal Cumulatively Enumerable Measure (CEM) 

Definition 4.4 (Cumulative measure Cfi) For semimeasure /i on B* define the cumu- 
lative measure Cfi: 

y^x: l{y)=l{x) yyx: l{y)<l{x) 

Note that we could replace "/(x)" by "/(a;) + c" in the definition above. Recall that x' denotes 
the smallest y >~ x with l{y) < l{x) {x' may be undefined). We have 

/i(x) = Cfi{x) if X = 11. ..1; else /i(x) = C/i(x) — C/i(x'). (20) 
Definition 4.5 (CEMs) Semimeasure fi is a CEM if Cfi{x) is enumerable for all x G B* . 

Then /i(x) is the difference of two finite enumerable values, according to (pOl). 
Theorem 4.1 There is a universal CEM. 

Proof. We first show that one can enumerate the CEMs, then construct a universal CEM 
from the enumeration. Check out differences to Levin's related proofs that there is a universal 
discrete semimeasure and a universal enumerable semimeasure | 100 , 52|, and Li and Vitanyi's 



presentation of the latter [^, p. 273 ff], attributed to J. Tyszkiewicz. 

Without loss of generality, consider only EOMs without halt instruction and with fixed 
input encoding of B* according to Def. |2]^. Such EOMs are enumerable, and correspond 
to an effective enumeration of all enumerable functions from B* to B^. Let EOMi denote 
the i-th EOM in the list, and let EOMi{x,n) denote its output after n instructions when 
applied to x E B*. The following procedure filters out those EOMi that already represent 
CEMs, and transforms the others into representations of CEMs, such that we obtain a way 
of generating all and only CEMs. 

FOR all i DO in dovetail fashion: 
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START: let Vfii{x) and Vfii{x) and VC^i{x) denote variable functions 
on B*. Set VfXi{X) := Vfii{X) := VCfXi{X) := 1, and VfXi{x) := 
Vfii{x) := VCfii{x) := for all other x e B*. Define VCiii{x') := 
for undefined x'. Let z denote a string variable. 
FOR n = 1,2, . . . DO: 

(1) Lexicographically order and rename all x with l{x) < n: 
:= A ^ := -< a;^ -< . . . -< x^"'*'^"^ := 11^- 

n 

(2) FOR k = 2"+i - 1 down to 1 DO: 

(2.1) Systematically search for the smallest m > n 
such that z := EOMi{x'', m) ^ X AND O.z > VCni{x^+^) 
iik< 2"+i - 1; set VC^i^{x'') := O.z. 

(3) For all X >- A satisfying l{x) < n, set Vfii{x) := VCfii{x) — 
VCfii{x'). For all x with /(x) < n, set Vfj,i{x) := Vfii{x) — 
Vni{xl) — Vfii{xO). For all x with l{x) = n, set Vfii{x) := 
Vi^i{x). 



If EOMi indeed represents a CEM /Xj then each search process in (2.1) will terminate, and the 
VCiJii{x) will enumerate the C^i{x) from below, and the V ^i{x) and Vjii{x) will approximate 
the true /ii(x) and fli{x), respectively, not necessarily from below though. Otherwise there 
will be a nonterminating search at some point, leaving Vfii from the previous loop as a trivial 
CEM. Hence we can enumerate all CEMs, and only those. Now define (compare ||52| ) : 

f^o{x) = anfin{x), p,o{x) = ^ a„/i„(x), where > 0, XI = 1' 

n>0 n>0 n 

and a„ is an enumerable constant, e.g., a„ = or a„ = ^^J^,^^ (note a shght difference to 
Levin's classic approach which just requests I]„a„ < 1). Then /iq dominates every //„ by 
Def. 13, and is a semimeasure according to Def. |4.1|: 



/xo(A) = 1; /io(2;) > 0; Hoix) = J2 an[/in(a;0) + /in(xl) + /i„(x)] = /io(xO) + /io(a;l) + fJ^oix). 

n>0 

(21) 

/io also is a CEM by Def. because 



y>x: l{x)=l(y) ">0 y^x: l{x)>l{y) n>Q 

Z] f^n{y) + Y f^niy) = Y anCf^nix) (22) 

is enumerable, since a„ and C^n{x) are (dovetail over all n). That is, ^q{x) is approximable 
as the difference of two enumerable finite values, according to Equation (pOl). □ 
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4.3 Approximable and Cumulatively Enumerable Distributions 



To deal with infinite x, we will now extend the treatment of semimeasures on B* in the 
previous subsection by discussing probability distributions on B'^. 

Definition 4.6 (Probabilities) A probability distribution P on x E B'^ satisfies 

X 

Definition 4.7 (Semidistributions) A semidistribution P on x E B^ satisfies 

Pix)>0; ^P(x)<l. 

X 

Definition 4.8 (Dominant Distributions) A distribution Pq dominates another distri- 
bution P if there is a constant cp > such that for all x E Bh 

Poix) > cpP{x). (23) 

Definition 4.9 (Universal Distributions) Let V be a set of probability distributions on 
X E BK a distribution Pq E V is universal if for all P E V: Pq dominates P. 

Theorem 4.2 There is no universal approximable semidistribution. 

Proof. The following proof is due to M. Hutter (personal communications by email following 
a discussion of enumerable and approximable universes on 2 August 2000 in Munich). It is an 
extension of a modified[| proof ||56| , p. 249 ff] that there is no universal recursive semimeasure. 

It suffices to focus on x G B*. Identify strings with integers, and assume P{x) is a 
universal approximable semidistribution. We construct an approximable semidistribution 
Q{x) that is not dominated by P{x), thus contradicting the assumption. Let Po{x), Pi{x), . . . 
be a sequence of recursive functions converging to P{x). We recursively define a sequence 
Qo{x), Qi{x), . . . converging to Q{x). The basic idea is: each contribution to Q is the sum of 
n consecutive P probabilities (n increasing). Define Qo{x) := 0; J„ := {y : <y <{n+l)^}. 
Let n be such that x E In- Define (A;") as the element with smallest Pt (largest Qt-i) 
probability in this interval, i.e., := minarg^g^^ Pj(x) {k^ := maxarg^g^^ (5t-i(a;)). If 
n-Pt{k^) is less than twice and n-Pt{it) is more than half of Qt-i{k^), set Qt{x) = Qt-i{x). 

^ As pointed out by M. Hutter (14 Nov. 2000, personal communication) and even earlier by A. Fujiwara 
(1998, according to P. M. B. Vitanyi, personal communication, 21 Nov. 2000), the proof on the bottom 
of p. 249 of |56) should be slightly modified. For instance, the sum could be taken over Xi-i < x < Xi. 
The sequence of inequalities J2xi-i<x<xi > XiP{xi) is then satisfiable by a suitable Xi sequence, since 
\mi'mix~too{xP{x)} = 0. The basic idea of the proof is correct, of course, and very useful. 
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Otherwise set Qt{x) = n-Pt{i^) for x = and Qt{x) = for x 7^ j". Qt{x) is obviously 
total recursive and non- negative. Since 2n<\In\, we have 

J2 Qt{x) < 2n-Pt{ff) = 2n-mmPt{x) < J] Ptix). 

Summing over n we observe that if Pt is a semidistribution, so is Qt- From some to on, Pt{x) 
changes by less than a factor of 2 since Pt{x) converges to P{x) > 0. Hence Qt{x) remains 
unchanged for t>to and converges to Q{x) := Qoo{x) = Qto{x). But = QtoUto) ^ 

n-PtQ^ii^) — k^'PUto)^ violating our universality assumption P{x) > c-Q{x). □ 

Definition 4.10 (Cumulatively Enumerable Distributions — CEDs) A distribution P 
on is a CED if CP{x) is enumerable for all x G B* , where 

CP{x):= Y: Piy) (24) 

yGBf-.y^x 



4.4 TM-Induced Distributions and Convergence Probability 

Suppose TM T's input bits are obtained by tossing an unbiased coin whenever a new one is 
requested. Levin's universal discrete enumerable semimeasure p2| , |35| or semidistribution 
m is limited to B* and halting programs: 



Definition 4.11 (m) 



p:Tip)=x 



(25) 



Note that Y^x^i^) < 1 if 7" universal. Let us now generalize this to B'^ and nonhalting 
programs: 



Definition 4.12 {Pt, KP^) Suppose T 's input bits are obtained by tossing an unbiased coin 
whenever a new one is requested. 

P^{x) = Y 2"'^P\ KPt{x) = -IgPrix) for Pt{x) > 0, (26) 

p:T{p)'^x 

where x,p G bK 



Program Continua. According to Def. |4. 12| , most infinite x have zero probability, but not 
those with finite programs, such as the dyadic expansion of 0.5^/2. However, a nonvanishing 
part of the entire unit of probability mass is contributed by continua of mostly incompressible 
strings, such as those with cumulative probability 2~'^'^'* computed by the following class of 
uncountably many infinite programs with a common finite prefix q: "repeat forever: read 
and print next input bit." The corresponding traditional measure-oriented notation for 

x:T{qx)'^x 
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would be 

dx = 2-'(^). 



lO.q 

For notational simplicity, however, we will continue using the J2 sign to indicate summation 
over uncountable objects, rather than using a measure-oriented notation for probability den- 
sities. The reader should not feel uncomfortable with this — the theorems in the remainder 
of the paper will focus on those a; G -B" with P{x) > 0; density-like nonzero sums over 
uncountably many bitsrings, each with individual measure zero, will not play any critical 
role in the proofs. 

Definition 4.13 (Universal TM-Induced Distributions P'^]KP'^) If C denotes a set 
of TMs with universal element , then we write 

P^{x) = Puc{x); KP^{x) := -Ig P^{x) for P^{x) > 0. (27) 

We have P'^{x) > for Dc C -B", the subset of C-describable x & BK The attribute 
universal is justified, because of the dominance Pt{x) = 0{P'^\x)), due to the Invariance 
Theorem (compare Def. |3.3| ). 

Since all programs of EOMs and MTMs converge, P^ and P^^ are proper probability 
distributions on BK For instance, J2xP^{^) = 1- however, is just a semidistribution. 
To obtain a proper probability distribution PN^, one might think of normalizing by the 
convergence probability T: 

Definition 4.14 (Convergence Probability) Given GTM T, define 
where 

p-3x:T{p)'^x 

Describability issues. Uniquely encode each TM T as a finite bitstring, and identify 
M, E, G with the corresponding sets of bitstrings. While the function f^ : M ^ B'^ : 
f(T) = is describable, even enumerable, the function f^ : G ^ B^ : f(T) = is not, 
essentially due to Theorem |2.1| . 

Even P^{x) and P*^(x) are generally not describable for x G -B", in the sense that there 
is no GTM T that takes as an input a finite description (or program) of any M-describable 
or E-describable x G -B" and converges towards P^\x) or P^{x). This is because in general 
it is not even weakly decidable (Def. |2.11|) whether two programs compute the same output. 
If we know that one of the program outputs is finite, however, then the conditions of weak 
decidability are fulfilled. Hence certain TM-induced distributions on B* are describable, as 
will be seen next. 

Definition 4.15 (TM-Induced Cumulative Distributions) If G denotes a set of TMs 
with universal element , then we write (compare Def. 

GP^{x) =GPuc{x). (28) 
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Lemma 4.1 For x G B* , CP^{x) is enumerable. 

Proof. The following algorithm computes CP^{x) (compare proof of Theorem ^.31): 



Initialize the real-valued variable V by 0, run all possible programs of EOM T 
dovetail style; whenever the output of a program prefix q starts with some y x 
for the first time, set V := V + 2~^*^'?^; henceforth ignore continuations of q. 

In this way V enumerates CP^{x). Infinite p are not problematic as only a finite prefix of 
p must be read to establish y y x if the latter indeed holds. □ 

Similarly, facts of the form y >~ x & B* can be discovered after finite time. 

Corollary 4.1 For x G B* , P^{x) is approximahle or describable as the difference of two 
enumerable values: 

P^ix) = j:P^iy)-j:P^iy), (29) 
Now we will make the connection to the previous subsection on semimeasures on B* . 

4.5 Universal TM-Induced Measures 

Definition 4.16 (P-Induced Measure ^P) Given a distribution P on B^, define a mea- 
sure fiP on B* as follows: 

nP(x) = J2 Pi^^)- (30) 

Note that JlP{x) = P{x) (compare Def. |]l]): 

/iP(A) = 1; iiP{x) = P{x) + /iP(xO) + /iP(xl). (31) 
For those x & B* without 0-bit we have ^P{x) = CP{x), for the others 

fiP{x) = CP{x) - CP{x'). (32) 

Definition 4.17 (TM-Induced Semimeasures yU^, /i*^, /i'^, /i*^) Given some TMT, for 
X G -B* define ^t{x) = fiPrix). Again we deviate a bit from Levin's B* -oriented path 
^lOQ] (survey: J5^, p. 245 ff, p. 272 ff]) and extend ht to x E B°°, where we define 
1^t{,x) = P,t{x) = Pt{x). If C denotes a set of TMs with universal element , then we 
write 

^C(x) = f^uc{x)] KpL^{x) := -Ig /i^(x) for ^f{x) > 0. (33) 

We observe that /i*" is universal among all T-induced semimeasures, T E C . Note that 

^C[x) = fi^\xO) + /i^(xl) + P^'{x) for X G B*; fj.^\x) = P^{x) for x G P°°. (34) 
It will be obvious from the context when we deal with the restriction of /x*" to B*. 

Corollary 4.2 For x G B* , fi^{x) is a GEM and approximable as the difference of two 
enumerable values: fi^{x) = CP^{x) for x without any 0-bit, otherwise 

/i^(x) = CP^(x) - CP^(a;'). (35) 
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4.6 Universal CEM vs EOM with Random Input 

Corollary [4.3| and Lemma |4.2| below imply that fi^ and /iq are essentially the same thing: 
randomly selecting the inputs of a universal EOM yields output prefixes whose probabilities 
are determined by the universal CEM. 



Corollary 4.3 Let /xq denote the universal CEM of Theorem l^.i] . For x G B*, 



Lemma 4.2 For x E B* , 

/io(x) =0(/i^(a;)). 

Proof. In the enumeration of EOMs in the proof of Theorem [4.1| , let EOMq be an EOM 
representing /xq. We build an EOM T such that firi^) = f^o{^)- The rest follows from the 



Invariance Theorem (compare Def. O 



T applies EOMq to all x e 5* in dovetail fashion, and simultaneously simply reads 
randomly selected input bits forever. At a given time, let string variable z denote T's 
input string read so far. Starting at the right end of the unit interval [0, 1), as the V'P,q{x) 



are being updated by the algorithm of Theorem T keeps updating a chain of finitely 
many, variable, disjoint, consecutive, adjacent, half-open intervals VI{x) of size VJiq^x) in 
alphabetic order on x, such that VI{y) is to the right of VI{x) iiy >~ x. After every variable 
update and each increase of z, T replaces its output by the x of the VI{x) with Q.z G VI{x). 
Since neither z nor the VCixq{x) in the algorithm of Theorem |4.1| can decrease (that is, all 
interval boundaries can only shift left), T's output cannot either, and therefore is indeed 
EOM-computable. Obviously the following holds: 

ChPt{x) = CPt{x) = Cfioix) 

and 

(iPrix) = Pt{xz) = Ho{x). 

□ 



5 Probability vs Descriptive Complexity 

The size of some computable object's minimal description is closely related to the object's 
probability. For instance. Levin |Q proved the remarkable Coding Theorem for his universal 
discrete enumerable semimeasure m based on halting programs (see Def. [4.11|) ; compare 
independent work by Chaitin who also gives credit to N. Pippenger: 

Theorem 5.1 (Coding Theorem) 

For X G B* , — log m{x) < K{x) < —log m{x) + 0(1) (36) 
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In this special case, the contributions of the shortest programs dominate the probabihties of 
objects computable in the traditional sense. As shown by Gacs p6| for the case of MTMs, 
however, contrary to Levin's conjecture, (x'^{x) ^ 0(2'^"^ ^^^); but a slightly worse 
bound does hold: 



Theorem 5.2 

Kfi^'^ix) - 1 < Km^{x) < K^i^{x) + Km^{Kii^{x)) + 0(1). (37) 



The term —1 on the left-hand side stems from the definition of lg{x) < log{x). We will 
now consider the case of probability distributions that dominate m, and semimeasures that 
dominate fi'^^ , starting with the case of enumerable objects. 



5.1 Theorems for EOMs and GTMs 

Theorem 5.3 For x e with P^{x) > 0, 

KP^'ix) - 1 < K^{x) < KP^'ix) + K^{KP''{x)) + 0(1). (38) 



Using K^{y) < log y + 2log log y + 0(1) for y interpreted as an integer — compare Def. |2.8| 
— this yields 

2-K^{-) < pi?(a;) < 0(2-^''("))(ir^(x))2. (39) 

That is, objects that are hard to describe (in the sense that they have only long enumerating 
descriptions) have low probability. 

Proof. The left-hand inequality follows by definition. To show the right-hand side, one can 
build an EOM T that computes x e using not more than KP^{x) + Kt{KP^{x))+0{1) 
input bits in a way inspired by Huffman-Coding The claim then follows from the 



Invariance Theorem. The trick is to arrange T's computation such that T's output converges 
yet never needs to decrease lexicographically. T works as follows: 



(A) Emulate to construct a real enumerable number O.s encoded as a self- 
delimiting input program r, simultaneously run all (possibly forever running) 
programs on dovetail style; whenever the output of a prefix q of any running 
program starts with some x & B* for the first time, set variable V{x) := V{x) + 
2-'('j) (^if no program has ever created output starting with x then first create 
V{x) initialized by 0); whenever the output of some extension q' of q (obtained by 
possibly reading additional input bits: g' = g if none are read) lexicographically 
increases such that it does not equal x any more, set V{x) := V{x) — 2~'^^ \ 

(B) Simultaneously, starting at the right end of the unit interval [0,1), as the 
V{x) are being updated, keep updating a chain of disjoint, consecutive, adjacent, 
half-open (at the right end) intervals IV{x) = [LV (x) , RV (x)) of size V{x) = 
RV{x) — LV{x) in alphabetic order on x, such that the right end of the IV{x) 
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of the largest x coincides with the right end of [0, 1), and IV (y) is to the right 
of IV{x) if y >~ X. After every variable update and each change of s, replace the 
output of T by the x of the IV{x) with O.s G IV{x). 

This will never violate the EOM constraints: the enumerable s cannot shrink, and since 
EOM outputs cannot decrease lexicographically, the interval boundaries RV{x) and LV{x) 
cannot grow (their negations are enumerable, compare Lemma ^?T| ) , hence T's output cannot 
decrease. 

For X G -B* the IV{x) converge towards an interval I{x) of size P^{x). For x G 5°° with 
P^{x) > 0, we have: for any e > there is a time to such that for all time steps t > to in 
T's computation, an interval Ie{x) of size P^{x) — e will be completely covered by certain 
IV{y) satisfying x >~ y and 0.x — O.y < e. So for e — the /e(x) also converge towards an 
interval I{x) of size P^{x). Hence T will output larger and larger y approximating x from 
below, provided O.s G /(x). 

Since any interval of size c within [0, 1) contains a number O.z with l{z) = —Ig c, in both 
cases there is a number O.s (encodable by some r satisfying r < l{s) + Kt{1{s)) +0(1))) with 
l{s) = -lgP^{x)+0{l), such that T(r) x, and therefore Kt{x) < l{s) + Kt{1{s)) +0(1). 
□ 

Less symmetric statements can also be derived in very similar fashion: 

Theorem 5.4 Let TM T induce approximable CPt{x) for all x G 5* (compare Defs. \4-l(] 
and an EOM would he a special case). Then for x G -B", Pt{x) > 0.' 



K^{x) < KPt{x) + K^{KPt{x)) + 0(1). (40) 



Proof. Modify the proof of Theorem [5.3| for approximable as opposed to enumerable interval 
boundaries and approximable O.s. □ 

A similar proof, but without the complication for the G 5°°, yields: 

Theorem 5.5 Let fi denote an approximable semimeasure on x ^ B* ; that is, ^{x) is 
describable. Then for yu(x) > 0.' 

Km^{x) < Kj^ix) + Km^{Kij{x)) + 0(1); (41) 

K^{x) < Kfi{x) + K^{Kfi{x)) + 0(1). (42) 

As a consequence, 

K^) < Q(2-^.™«(-)); < 0(2-^^(^')). (43) 



Kjj{x)log'^Kjj{x) ' Kjj,{x)log'^Kjj,{x 

Proof. Initialize variables Vx := 1 and IVx := [0, 1). Dovetailing over all x A, approximate 
the GTM-computable /i(x) = /x(x) — /i(xO) — /i(xl) in variables Vx initialized by zero, and 
create a chain of adjacent intervals IVx analogously to the proof of Theorem ^.3|. 
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The IVx converge against intervals of size Hence x is GTM-encodable by any 

program r producing an output s with O.s G Ix- after every update, replace the GTM's 
output by the x of the IV^ with O.s G IV^. Similarly, if O.s is in the union of adjacent 
intervals ly of strings y starting with x, then the GTM's output will converge towards some 
string starting with x. The rest follows in a way similar to the one described in the final 
paragraph of the proof of Theorem |5.3| . □ 



Using the basic ideas in the proofs of Theorem |5.3| and |5.5| in conjunction with Corollary ^ 



and Lemma 14. 21, one can also obtain statements such as: 



Theorem 5.6 Let fiQ denote the universal CEM from Theorem^^. For x G B* , 

Kjjoix) - 0(1) < Km^{x) < Kfioix) + Km^{Ki^o{x)) + 0(1) (44) 

While dominates P*^ and P*^ dominates P^, the reverse statements are not true. In fact, 
given the results from Sections |3.2| and |^, one can now make claims such as the following 
ones: 

Corollary 5.1 The following functions are unbounded: 

fi^{x) P^{x) P^{x) 



/iA^(x)' P^{x)' P^{x) 



Proof. For the cases fi^ and P^, apply Theorems ^.2| , |5.6| and the unboundedness of ([12|). 
For the case P*^, apply Theorems ^]3| and p73[ 

5.2 Tighter Bounds? 

Is it possible to get rid of the small correction terms such as K^{KP^{x)) < 0{log{—logP^{x)) 
in Theorem ^.3[ ? Note that the construction in the proof shows that K^{x) is actually 
bounded by K^{s), the complexity of the enumerable number O.s G I{x) with minimal 
Kt^s). The facts J^xf'^^i^) = 1; J2xP^{^) = 1; J2xP'^{^) < 1) well as intuition and 
wishful thinking inspired by Shannon-Fano Theorem [|7S| and Coding Theorem ^ suggest 
there might indeed be tighter bounds: 



Conjecture 5.1 For x e with P^'{x) > 0: K^^x) < KP^\x) + 0(1). 
Conjecture 5.2 For x e B^ with P^{x) > 0: K^{x) < KP^{x) + 0(1). 

Conjecture 5.3 For x E B^ with P^{x) > 0: K'^{x) < KP^{x) + 0(1). 

The work of Gacs has already shown, however, that analogue conjectures for semimeasures 
such as /i*^ (as opposed to distributions) are false |]36 . 
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5.3 Between EOMs and GTMs? 



The dominance of over comes at the expense of occasionally "unreasonable," noncon- 
verging outputs. Are there classes of always converging TMs more expressive than EOMs? 
Consider a TM called a PEOM whose inputs are pairs of finite bitstrings x,y & B* (code 
them using 2log l{x) + 2log l{y) + l{xy) +0(1) bits). The PEOM uses dovetailing to run 
all self-delimiting programs on the y-th EOM of an enumeration of all EOMs, to approxi- 
mate the probability PEOM{y, x) (again encoded as a string) that the EOM's output starts 
with X. PEOM{y, x) is approximable (we may apply Theorem ^.5|) but not necessarily 
enumerable. On the other hand, it is easy to see that PEOMs can compute all enumerable 
strings describable on EOMs. In this sense PEOMs are more expressive than EOMs, yet 
never diverge like GTMs. EOMs can encode some enumerable strings slightly more com- 
pactly, however, due to the PEOM's possibly unnecessarily bit-consuming input encoding. 
An interesting topic of future research may be to establish a partially ordered expressiveness 
hierarchy among classes of always converging TMs, and to characterize its top, if there is 
one, which we doubt. Candidates to consider may include TMs that approximate certain 
recursive or enumerable functions of enumerable strings. 



6 Temporal Complexity 

So far we have completely ignored the time necessary to compute objects from programs. 
In fact, the objects that are highly probable according to P*^ and P^ and fi^ introduced in 
the previous sections yet quite improbable according to less dominant priors studied earlier 
(such as and recursive priors ||100| , [5^ , ^ |36| , ^) are precisely those whose computation 
requires immense time. For instance, the time needed to compute the describable, even enu- 



merable Qn grows faster than any recursive function of n, as shown by Chaitin ||2^. Analogue 
statements hold for the z of Theorem |3.2| . Similarly, many of the semimeasures discussed 
above are approximable, but the approximation process is excessively time-consuming. 

Now we will study the opposite extreme, namely, priors with a bias towards the fastest 
way of producing certain outputs. Without loss of generality, we will focus on computations 
on a universal MTM. For simplicity let us extend the binary alphabet such that it contains 
an additional output symbol "blank." 



6.1 Fast Computation of Finite and Infinite Strings 

There are many ways of systematically enumerating all computable objects or bitstrings. All 
take infinite time. Some, however, compute individual strings much faster than others. To 
see this, first consider the trivial algorithm "ALPHABET," which simply lists all bitstrings 
ordered by size and separated by blanks (compare Marchal's thesis [^| and Moravec's library 
of all possible books [§2|). ALPHABET will eventually create all initial finite segments 
of all strings. For example, the nth bit of the string "11111111..." will appear as part 
of alphabet's 2"-th output string. Note, however, that countably many steps are not 
sufficient to print any infinite string of countable size! 
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There are much faster ways though. For instance, the algorithm used in the previous 
paper on the computable universes sequentially computes all computable bitstrings by 
a particular form of dovetailing. Let denote the i-th possible program. Program is run 
for one instruction every second step (to simplify things, if the TM has a halt instruction and 
has halted we assume nothing is done during this step — the resulting loss of efficiency is 
not significant for what follows). Similarly, p"^ is run for one instruction every second of the 
remaining steps, and so on. 

Following Li and Vitanyi p. 503 ff], let us call this popular dovetailer "SIMPLE." It 
turns out that SIMPLE actually is the fastest in a certain sense. For instance, the nth bit of 
string "11111111..." now will appear after at most 0(n) steps (as opposed to at least 0(n2") 
steps for ALPHABET). Why? Let p^ be the fastest algorithm that outputs "11111111...". 
Obviously p^ computes the n-th bit within 0{n) instructions. Now SIMPLE will execute 
one instruction of p^ every 2~*^ steps. But 2~^ is a positive constant that does not depend 
on n. 

Generally speaking, suppose p'' is among the fastest finite algorithms for string x and 
computes x„ within at most 0{f{n)) instructions, for all n. Then x's first n symbols will 
appear after at most 0{f{n)) steps of SIMPLE. In this sense SIMPLE essentially computes 
each string as quickly as its fastest algorithm, although it is in fact computing all computable 
strings simultaneously. This may seem counterintuitive. 



6.2 FAST: The Most Efficient Way of Computing Everything 



Subsection |6.1| focused on SIMPLE "steps" allocated for instructions of single string-generating 
algorithms. Note that each such step may require numerous "micro-steps" for the computa- 
tional overhead introduced by the need for organizing internal storage. For example, quickly 
growing space requirements for storing all strings may force a dovetailing TM to frequently 
shift its writing and scanning heads across large sections of its internal tapes. This may 
consume more time than necessary. 

To overcome potential slow-downs of this kind, and to optimize the TM-specific "constant 
factor," we will slightly modify an optimal search algorithm called "Levin search" ^ |1|, 
73| , |97| , f76| for the first practical applications we are aware of). Essentially, we will 



see 



strip Levin search of its search aspects and apply it to possibly infinite objects. This leads to 
the most efficient (up to a constant factor depending on the TM) algorithm for computing 
all computable bitstrings. 

FAST Algorithm : For i = 1, 2, . . . perform PHASE i: 

PHASE i: Execute 2'-^^p^ instructions of all program prefixes p satis- 
fying l{p) < i, and sequentially write the outputs on adjacent sections 
of the output tape, separated by blanks. 

Following Levin [^], within 2*^"'"^ TM steps, each of order 0(1) "micro-steps" (no excessive 



computational overhead due to storage allocation etc.), FAST will generate all prefixes Xr, 
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satisfying Kt{xn) < k, where x„'s Levin complexity Kt{xn) is defined as 

Kt{xn) = min{Z(g) + log t(g,x„)}, 



where program prefix q computes Xn in t{q, x„) time steps. The computational complexity 
of the algorithm is not essentially affected by the fact that PHASE i = 2,3,..., repeats 
the computation of PHASE i — 1 which for large i is approximately half as short (ignoring 
nonessential speed-ups due to halting programs if there are any). 

One difference between SIMPLE and FAST is that SIMPLE may allocate steps to al- 
gorithms with a short description less frequently than FAST. Suppose no finite algorithm 
computes x faster than p'' which needs at most f{n) instructions for x„, for all n. While 
SIMPLE needs 2^^^f{n) steps to compute x„, following Levin it can be shown that 
FAST requires at most 2^(P*)+V(n) steps — compare |5|, p. 504 ff]. That is, SIMPLE 



and FAST share the same order of time complexity (ignoring SIMPLE's "micro-steps" for 
storage organization), but FAST's constant factor tends to be better. 

Note that an observer A evolving in one of the universes computed by FAST might 
decide to build a machine that simulates all possible computable universes using FAST, 
and so on, recursively. Interestingly, this will not necessarily cause a dramatic exponential 
slowdown: if the n-th discrete time step of A's universe (compare Example |1 . 1|) is computable 
within 0{n) time then A's simulations can be as fast as the "original" simulation, save for 



a constant factor. In this sense a "Great Programmer" ||T2| who writes a program that runs 
all possible universes would not be superior to certain nested Great Programmers inhabiting 
his universes. 

To summarize: the effort required for computing all computable objects simultaneously 
does not prevent FAST from computing each object essentially as quickly as its fastest 
algorithm. No other dovetailer can have a better order of computational complexity. This 
suggests a notion of describability that is much more restricted yet perhaps much more 
natural than the one used in the earlier sections on description size-based complexity. 



6.3 Speed-Based Characterization of the Describable 

The introduction mentioned that some sets seem describable in a certain sense while most 
of their elements are not. Although the dyadic expansions of most real numbers are not 
individually describable, the short algorithm ALPHABET from Section will compute all 
their finite prefixes. However, ALPHABET is unable to print any infinite string using only 
countable time and storage. Rejection of the notion of uncountable storage and time steps 
leads to a speed-based definition of describability. 

Definition 6.1 ("S-describable" Objects) Somex G zs S-describable ("S" for "Speed") 
if it has a finite algorithm that outputs x using countable time and space. 



Lemma 6.1 With countable time and space requirements, FAST computes all S-describable 
strings. 
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To see this, recall that FAST will output any S-describable string as fast as its fastest algo- 
rithm, save for a constant factor. Those x with polynomial time bounds on the computation 
of Xn (e.g., 0(n^'')) are S-describable, but most x E are not, as obvious from Cantor's 
insight p3| . 



The prefixes x„ of all x G -B", even of those that are not S-describable, are computed 
within at most 0(n2"') steps, at least as quickly as by ALPHABET. The latter, however, 
never is faster than that, while FAST often is. Now consider infinite strings x whose fastest 
individual finite program needs even more than 0(n2") time steps to output x„ and nothing 
but Xn, such as Chaitin's Q (or the even worse z from Theorem p.3|) — recall that the time 
for computing f2„ grows faster than any recursive function of n We observe that this 
result is irrelevant for FAST which will output f2„ within 0{n2^) steps, but only because 
it also outputs many other strings besides fi„ — there is still no fast way of identifying f2„ 
among all the outputs, fl is not S-describable because it is not generated any more quickly 
than uncountably many other infinite and incompressible strings, which are not S-describable 
either. 



6.4 Enumerable Priors vs FAST 

The FAST algorithm gives rise to a natural prior measure on the computable objects which 
is much less dominant than fi^^ , fi^ and /i*^. This prior will be introduced in Section |6.5| 
below. Here we first motivate it by evaluating drawbacks of the traditional, well-studied, 
enumerable prior /i^^ g, |S§ in the context of FAST. 

Definition 6.2 {p —>■ x,p x) Given program prefix p, write p —>■ x if our MTM reads p 
and computes output starting with x G B* , while no prefix of p consisting of less than l{p) 
hits outputs X. Write p -^i x if p ^ x in PHASE i o/FAST. 

We observe that 

^^^\x)=l^m.,^^ 2-'^p\ (45) 
but there is no recursive function i{x) such that 

fx^'ix)= Y: 2-'(^\ (46) 

otherwise fj,^^{x) would be recursive. Therefore we might argue that the use of prior fi^ 
is essentially equivalent to using a probabilistic version of FAST which randomly selects a 
phase according to a distribution assigning zero probability to any phase with recursively 
computable number. Since the time and space consumed by PHASE i is at least 0(2*), we are 
approaching uncountable resources as i goes to infinity. From any reasonable computational 
perspective, however, the probability of a phase consuming more than countable resources 
clearly should be zero. This motivates the next subsection. 
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6.5 Speed Prior S and Algorithm GUESS 



A resource-oriented point of view suggests the following postulate. 

Postulate 6.1 The cumulative prior probability measure of all x incomputable within time 
t by the most efficient way of computing everything should be inversely proportional to t. 

Since the most efficient way of computing all x is embodied by FAST, and since each 
phase of FAST consumes roughly twice the time and space resources of the previous phase, 
the cumulative prior probability of each finite phase should be roughly half the one of the 
previous phase; zero probability should be assigned to infinitely resource-consuming phases. 
Postulate |6Tl| therefore suggests the following definition. 

Definition 6.3 (Speed Prior S) Define the speed prior S on B* as 

oo 

S{x) := J22-'Si{x); where Si{X) = 1; Si{x) = ^ 2-'^^) for x y X. 

We observe that S{x) is indeed a semimeasure (compare Def. [4.1| ): 

S{xO) + S{xl) + S{x) = Six); where S{x) > 0. 
Since x G B* is first computed in PHASE Kt(x) within 2^*^^)+^ steps, we may rewrite: 

oo 

Six) = 2-^*(^) 2-'SKti.)+^-l{x) < 2-^*(^) (47) 
1=1 

S can be implemented by the following probabilistic algorithm for a universal MTM. 

Algorithm GUESS: 

1. Toss an unbiased coin until heads is up; let i denote the number of 
required trials; set t := 2*. 

2. If the number of steps executed so far exceeds t then exit. Execute 
one step; if it is a request for an input bit, toss the coin to determine 
the bit, and set t := t/2. 

3. Go to 2. 

In the spirit of FAST, algorithm GUESS makes twice the computation time half as likely, 
and splits remaining time in half whenever a new bit is requested, to assign equal runtime 
to the two resulting sets of possible program continuations. Note that the expected runtime 
of GUESS is unbounded since 2~*2* does not converge. Expected runtime is count- 
able, however, and expected space is of the order of expected time, due to numerous short 
algorithms producing a constant number of output bits per constant time interval. 

Assuming our universe is sampled according to GUESS implemented on some machine, 
note that the true distribution is not essentially different from the estimated one based on 
our own, possibly different machine. 
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6.6 Speed Prior-Based Inductive Inference 



Given S, as we observe an initial segment x G -B* of some string, which is the most hkely 
continuation? Consider x's finite continuations xy,y G B*. According to Bayes (compare 
Equation (p!5|)), 

I . ^ S{x I xy)S{xy) S{xy)_ 

^ Six) S{x)' ^ ' 

where S{z^ \ z^) is the measure of z^, given z^. Having observed x we will predict those y 
that maximize S{xy \ x). Which are those? In what follows, we will confirm the intuition 
that for n — i> oo the only probable continuations of x„ are those with fast programs. The 
sheer number of "slowly" computable strings cannot balance the speed advantage of "more 
quickly" computable strings with equal beginnings. 

Definition 6.4 (p x etc.) Write p x if finite program p (p ^ x) computes x 
within less than k steps, and p — >i x if it does so within PHASE i of FAST. Similarly for 
p X and p -^^i X (at most k steps), p x, (exactly k steps), p x, (at least k 
steps), p X (more than k steps). 



Theorem 6.1 Suppose x G B°° is S-describable, and p^ G B* outputs x„ within at most 
f{n) steps for all n, and g{n) > 0{f{n)). Then 

E£i2-*E>.(.) 2^'(p) 
Q{x, f) := ^'^n^oo^^--^^^^^^^^^ = 0. 

Proof. Since no program that requires at least g{n) steps for producing x„ can compute x„ 
in a phase with number < log g{n), we have 

■r^oo r)-log g(n)-i 9~'(p) 
l^i=l ^ >g{n) ^ 

Oix a f) < Urn p — i^+iog g(n))^n 

^Kx, g, J) ^ nmn-.oo 2-/09 /(n)-i ^ ^^^^^ 2-'(p) " 

linin-KX) , . — z — — < lirrir. — 



Here we have used the Kraft inequality to obtain a rough upper bound for the enumer- 
ator: when no p is prefix of another one, then X]p2~'*^*') < 1- CIl 



Hence, if we know a rather fast finite program p^ for x, then Theorem |6.1| allows for predict- 
ing: if we observe some x„ (n sufficiently large) then it is very unlikely that it was produced 
by an x-computing algorithm much slower than p^. 

Among the fastest algorithms for x is FAST itself, which is at least as fast as p^, save 
for a constant factor. It outputs x„ after 0(2^**^^'"^) steps. Therefore Theorem SJ. tells us: 
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Corollary 6.1 Letx G B°° be S-describable. Forn oo, with probability 1 the continuation 
of Xn is computable within 0(2^*'^^"^) steps. 

Given observation x with l{x) —>■ oo, we predict a continuation y with minimal Kt{xy). 

Example 6.1 Consider Example |1.2| and Equation (|l]). According to the weak anthropic 
principle, the conditional probability of a particular observer finding herself in one of the 
universes compatible with her existence equals 1. Given S, we predict a universe with 
minimal Kt. Short futures are more likely than long ones: the probability that the universe's 
history so far will extend beyond the one computable in the current phase of FAST (that 
is, it will be prolongated into the next phase) is at most 50 %. Infinite futures have measure 
zero. 



6.7 Practical Applications of Algorithm GUESS 

Algorithm GUESS is almost identical to a probabilistic search algorithm used in previous 
work on applied inductive inference |7l|, |73|. The programs generated by the previous algo- 



rithm, however, were not bitstrings but written in an assembler-like language; their runtimes 
had an upper bound, and the program outputs were evaluated as to whether they represented 
solutions to externally given tasks. 

Using a small set of exemplary training examples, the system discovered the weight 
matrix of an artificial neural network whose task was to map input data to appropriate 
target classifications. The network's generalization capability was then tested on a much 
larger unseen test set. On several toy problems it generalized extremely well in a way 
unmatchable by traditional neural network learning algorithms. 

The previous papers, however, did not explicitly establish the above-mentioned relation 
between "optimal" resource bias and GUESS. 



7 Consequences for Physics 

As obvious from equations (|l|) and (|15D, some observer's future depends on the prior from 
which his/her universe is sampled. More or less general notions of TM-based describability 
put forward above lead to more or less dominant priors such as P*^ on formally describable 
universes, and on enumerable universes, P^'^ and /i^^ and recursive priors on mono- 
tonically computable universes, S on S-describable universes. We will now comment on the 
plausibility of each, and discuss some consequences. Prior S, the arguably most plausible 
and natural one, provokes specific predictions concerning our future. For a start, however, we 
will briefiy review Solomonoff's traditional theory of inductive inference based on recursive 
priors. 

7.1 Plausibility of Recursive Priors 

The first number is 2, the second is 4, the third is 6, the fourth is 8. What is the fifth? The 
correct answer is "250," because the nth number is — 5n^ — 15n^ + 125n^ — 224n + 120. In 
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certain IQ tests, however, the answer "250" will not yield maximal score, because it does not 
seem to be the "simplest" answer consistent with the data (compare [^). And physicists 
and others favor "simple" explanations of observations. 

Roughly fourty years ago Solomonoff set out to provide a theoretical justification of this 
quest for simplicity [^. He and others have made substantial progress over the past decades. 
In particular, technical problems of Solomonoff 's original approach were partly overcome by 
Levin |^4| who introduced self-delimiting programs, m and fi^^ mentioned above, as well 



as several theorems relating probabilities to complexities — see also Chaitin's and Gacs' 
independent papers on prefix complexity and m |]35| , Solomonoff 's work on inductive 
inference helped to inspire less general yet practically more feasible principles of minimum 



description length |95, 66, 44| as well as time-bounded restrictions of Kolmogorov complexity. 



e.g., [^, H, |5^, as well as the concept of "logical depth" of x, the runtime of the shortest 
program of x p 



Equation (|1^) makes predictions of the entire future, given the past. This seems to be 
the most general approach. Solomonoff focuses just on the next bit in a sequence. Al- 
though this provokes surprisingly nontrivial problems associated with translating the bitwise 
approach to alphabets other than the binary one — only recently Hutter managed to do this 
[0 — it is sufficient for obtaining essential insights ||83| . 

Given an observed bitstring x, Solomonoff assumes the data are drawn according to a 
recursive measure /x; that is, there is a MTM program that reads x E B* and computes fi{x) 
and halts. He estimates the probability of the next bit (assuming there will be one), using 
the fact that the enumerable fi'^ dominates the less general recursive measures: 

Kix^'ix) <-logfxix)+c^, (49) 

where is a constant depending on fi but not on x. Compare [^, p. 282 ff]. Solomonoff 
showed that the /i^^-probability of a particular continuation converges towards /i as the 
observation size goes to infinity ||83|. Hutter recently extended his results by showing that the 



number of prediction errors made by universal Solomonoff prediction is essentially bounded 
by the number of errors made by any other recursive prediction scheme, including the optimal 
scheme based on the true distribution /i Hutter also extended Solomonoff 's passive 

universal induction framework to the case of agents actively interacting with an unknown 
environment [HS . 



A previous paper on computable universes [[72| , Section: Are we Run by a Short Algo- 
rithm?] apphed the theory of inductive inference to entire universe histories, and predicted 
that simple universes are more likely; that is, observers are likely to find themselves in a 



simple universe compatible with their existence (compare everything mailing list archive pO 



messages dated 21 Oct and 25 Oct 1999: \http://www. escribe. com/science/theory/ml 284- html 



and ml312.html). There are two ways in which one could criticize this approach. One sug- 
gests it is too general, the other suggests it is too restrictive. 

1. Recursive priors too general? /i^^(x) is not recursively computable, hence there 
is no general practically feasible algorithm to generate optimal predictions. This suggests 
to look at more restrictive priors, in particular, S, which will receive additional motivation 
further below. 
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2. Recursive priors too restricted? If we want to explain the entire universe, then 
the assumption of a recursive P on the possible universes may even be insufficient. In 
particular, although our own universe seems to obey simple rules — a discretized version of 
Schrodinger's wave function could be implemented by a simple recursive algorithm — the 
apparently noisy fluctuations that we observe on top of the simple laws might be due to a 
pseudorandom generator (PRG) subroutine whose output is describable, even enumerable, 
but not recursive — compare Example 

In particular, the fact that nonrecursive priors may not allow for recursive bounds on 
the time necessary to compute initial histories of some universe does not necessarily prohibit 
nonrecursive priors. Each describable initial history may be potentially relevant as there is 
an infinite computation during which it will be stable for all but finitely many steps. This 
suggests to look at more general priors such as /i^, P^, P*^, which will be done next, before 
we come back to the speed prior S. 



7.2 Plausibility of Cumulatively Enumerable Priors 

The semimeasure /x*^ used in the traditional theory of inductive inference is dominated by 
the nonenumerable yet approximable fi^ (Def. |4.17| ) assigning approximable probabilities to 
initial segments of strings computable on EOMs. 

As Chaitin points out [^, enumerable objects such as the halting probabilities of TMs 
are already expressive enough to express anything provable by finite proofs, given a set 
of mathematical axioms. In particular, knowledge of fi^, the first n bits of the halting 
probability of TM T, conveys all information necessary to decide by a halting program 
whether any given statement of an axiomatic system describable by fewer than n — 0(1) bits 
is provable or not within the system. 

is effectively random in the sense of Martin-Lof |61|. Therefore it is generally undis- 



tinguishable from noise by a recursive function of n, and thus very compact in a certain sense 



in fact, all effectively random reals are Omegas, as recently shown by Slaman [HQ] building 



on work by Solovay see also |2T|, One could still say, however, that Q (decompresses 



mathematical truth at least enough to make it retrievable by a halting program. Assum- 
ing that this type of mathematical truth contains everything relevant for a theory of all 
reasonable universes, and assuming that the describable yet even "more random" patterns 



of Theorem |3.3| are not necessary for such a theory, we may indeed limit ourselves to the 
enumerable universes. 

If Conjecture [5.2| were true, then we would have P^{x) = 0{2~^'^^^'') (compare Equation 
(|l])), or P^{xy) = 0{2~^ ^^y^) (compare (pTsP). That is, the most likely continuation y 
would essentially be the one corresponding to the shortest algorithm, and no cumulatively 
enumerable distribution could assign higher probability than 0{2~^ ^^'^^) to xy. Maximizing 
P^{xy) would be equivalent to minimizing K^{xy). 

Since the upper bound given by Theorem |5]^ is not quite as sharp due to the additional, 
at most logarithmic term, we cannot make quite as strong a statement. Still, Theorem 



57^ does tell us that P {xy) goes to zero with growing K {xy) almost exponentially fast. 



and Theorem says that fi^{xyk) {k fix) goes to zero with growing Km^{xyk) almost 
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exponentially fast. 

Hence, the relatively mild assumption that the probability distribution from which our 
universe is drawn is cumulatively enumerable provides a theoretical justification of the pre- 
diction that the most likely continuations of our universes are computable by short EOM 
algorithms. However, given , Occam's razor (e.g., |T^) is only partially justified because 
the sum of the probabilities of the most complex xy does not vanish: 

limn^oo J2 P^ixy) > 0. 

xyeBf:K^{xy)>n 



To see this, compare Def. |4.12| and the subsequent paragraph on program continua. There 
would be a nonvanishing chance for an observer to end up in one of the maximally complex 
universes compatible with his existence, although only universes with finite descriptions have 
nonvanishing individual probability. 

We will conclude this subsection by addressing the issue of falsifiability. If or fi^ 
were responsible for the pseudorandom aspects of our universe (compare Example 
then this might indeed be effectively undetectable in principle, because some approximable 
and enumerable patterns cannot be proven to be nonrandom in recursively bounded time. 
Therefore the results above may be of interest mainly from a philosophical point of view, not 
from a practical one: yes, universes computable by short EOM algorithms are much more 
likely indeed, but even if we inhabit one then we may not be able to find its short algorithm. 



7.3 Plausibility of Approximable Priors 

fi^ assigns low probability to G-describable strings such as the z of Theorem |3]^. However, 
one might believe in the potential significance of such constructively describable patterns, 
e.g., by accepting their validity as possible pseudorandom perturbations of a universe oth- 
erwise governed by a quickly computable algorithm implementing simple physical laws — 
compare Example |2.1| . Then one must also look at semimeasures dominating /x^, although 
the falsifiability problem mentioned above holds for those as well. 

The top of the TM dominance hierarchy is embodied by G (Theorem |3.3| ); the top of 
our prior dominance hierarchy by P*^, the top of the corresponding semimeasure dominance 
hierarchy by /x*^. If Conjecture |5.3| were true, then maximizing P'^{xy) would be equivalent 
to minimizing K'^{xy). Even then there would be a fundamental problem besides lack of 
falsifiability: Neither P'^ nor /i*^ are describable, and not even a "Great Programmer" 
could generally decide whether some GTM output is going to converge (Theorem |2.1| ), or 
whether it actually represents a "meaningless" universe history that never stabilizes. 

Thus, if one adopts the belief that nondescribable measures do not exist, simply because 
there is no way of describing them, then one may discard this option. 

This would suggest considering semimeasures less dominant than /i*^, for instance, one of 
the most dominant approximable /x. According to Theorem ^]5| and inequality (|43|), fj,{xy) 
goes to zero almost exponentially fast with growing Kmf'{xy). 

As in the case of /i^, this may interest the philosophically inclined more than the prag- 
matists: yes, any particular universe history without short description necessarily is highly 
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unlikely; much more likely are those histories where our lives are deterministically computed 
by a short algorithm, where the algorithmic entropy (compare [^) of the universe does 
not increase over time, because a finite program conveying a finite amount of information is 
responsible for everything, and where concepts such as "free will" are just an illusion in a 
certain sense. Nevertheless, there may not be any effective way of proving or falsifying this. 



7.4 Plausibility of Speed Prior S 

Starting with the traditional case of recursive priors, the subsections above discussed more 
and more dominant priors as candidates for the one from which our universe is sampled. 
Now we will move towards the other extreme: the less dominant prior S which in a sense is 
optimal with respect to temporal complexity. 

So far, without much ado, we have used a terminology according to which we "draw a 
universe from a particular prior distribution." In the TM-based set-ups (see Def. f4.12|) this 
in principle requires a "binary oracle," a source of true randomness, to provide the TM's 
inputs. Any source of randomness, however, leaves us with an unsatisfactory explanation of 
the universe, since random strings do not have a compact explanation, by definition. The 
obvious way around this, already implicit in the definition of firix) (see Def. [4.17| ), is the 



"ensemble approach" which runs all possible TM input programs and sums over the lengths 
of those that compute strings starting with x. 

Once we deal with ensemble approaches and explicit computations in general, however, 
we are forced to accept their fundamental time constraints. As mentioned above, many of 
the shortest programs of certain enumerable or describable strings compute their outputs 
more slowly than any recursive upper bound could indicate. 

If we do assume that time complexity of the computation should be an issue, then why 
stop with the somewhat arbitrary restriction of recursiveness, which just says that the time 
required to compute something should be computable by a halting program? Similarly, why 
stop with the somewhat arbitrary restriction of polynomial time bounds which are subject 
of much of the work in theoretical computer science? 

If I were a "Great Programmer" |]72| with substantial computing resources, perhaps 



beyond those possible in our own universe which apparently does not permit more than 
10^^ operations per second and kilogram [|T^, S7], yet constrained by the fundamental limits 



of computability, I would opt for the fastest way of simulating all universes, represented 
by algorithm FAST (Section P). Similarly, if I were to search for some computable object 
with certain properties discoverable by Levin's universal search algorithm (the "mother" of 
FAST), I would use the latter for its optimality properties. 

Consider the observers evolving in the many different possible universes computed by 
FAST or as a by-product of Levin Search. Some of them would be identical, at least for 
some time, collecting identical experiences in universes with possibly equal beginnings yet 
possibly different futures. At a given time, the most likely instances of a particular observer 
A would essentially be determined by the fastest way of computing A. 

Observer A might adopt the belief the Great Programmer was indeed smart enough to 
implement the most efficient way of computing everything. And given A's very existence, A 
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can conclude that the Great Programmer's resources are sufficient to compute at least one 
instance of A. What A does not know, however, is the current phase of FAST, or whether 
the Great Programmer is interested in or aware of A, or whether A is just an accidental 
by-product of some Great Programmer's search for something else, etc. 

Here is where a resource-oriented bias comes in naturally. It seems to make sense for 
A to assume that the Great Programmer is also bound by the limits of computability, that 
infinitely late phases of FAST consuming uncountable resources are infinitely unlikely, that 
any Great Programmer's a priori probability of investing computational resources into some 
search problem tends to decrease with growing search costs, and that the prior probability 
of anything whose computation requires more than 0{n) resources by the optimal method 
is indeed inversely proportional to n. This immediately leads to the speed prior S. 

Believing in S, A could use Theorem |6.1| to predict the future (or "postdict" unknown 
aspects of the past) by assigning highest probability to those S-describable futures (or pasts) 
that are (a) consistent with A's experiences and (b) are computable by short and fast algo- 
rithms. The appropriate simplicity measure minimized by this resource-oriented version of 
Occam's razor is the Levin complexity Kt. 



7.5 ^-Based Predictions 

If our universe is indeed sampled from the speed prior S, then we might well be able to 
discover the algorithm for the apparent noise on top of the seemingly simple physical laws 
— compare Example It may not be trivial, as trivial pseudorandom generators (PRGs) 
may not be quite sufficient for evolution of observers such as ourselves, given the other laws 
of physics. But it should be much less time-consuming than, say, an algorithm computing 



the z of Theorem ^]3| which are effectively indistinguishable from true, incompressible noise. 

Based on prior S, we predict: anything that appears random or noisy in our own partic- 
ular world is due to hitherto unknown regularities that relate seemingly disconnected events 
to each other via some simple algorithm that is not only short (the short algorithms are 
favored by all describable measures above) but also fast. This immediately leads to more 
specific predictions. 



7.5.1 Beta Decay 

When exactly will a particular neutron decay into a proton, an electron and an antineutrino? 
Is the moment of its death correlated with other events in our universe? Conventional wisdom 
rejects this idea and suggests that beta decay is a source of true randomness. According 
to S, however, this cannot be the case. Never-ending true randomness is neither formally 
describable (Def. |2.5D nor S-describable (Def. |6.1D ; its computation would not be possible 
using countable computational steps. 

This encourages a re-examination of beta decay or other types of particle decay: given 
S, a very simple and fast but maybe not quite trivial PRG should be responsible for the 
decay pattern of possibly widely separated neutrons. (If the PRG were too trivial and too 
obvious then maybe the resulting universe would be too simple to permit evolution of our 
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type of consciousness, thus being ruled out by the weak anthropic principle.) Perhaps the 
main reason for the current absence of empirical evidence in this vein is that nobody has 
systematically looked for it yet. 



7.5.2 Many World Splits 



Everett's many worlds hypothesis P3| essentially states: whenever our universe's quantum 
mechanics based on Schrodinger's equation allows for alternative "collapses of the wave 
function," all are made and the world splits into separate universes. The previous paper 
Q already pointed out that from our algorithmic point of view there are no real splits — 
there are just a bunch of different algorithms which yield identical results for some time, 
until they start computing different outputs corresponding to different possible observations 
in different universes. According to P'^ , , fi^ , fi'^^ , S, however, most of these alternative 
continuations are much less likely than others. 

In particular, the outcomes of experiments involving entangled states, such as the obser- 
vations of spins of initially close but soon distant particles with correlated spins, are currently 
widely assumed to be random. Given S, however, whenever there are several possible con- 
tinuations of our universe corresponding to different wave function collapses, and all are 
compatible with whatever it is we call our consciousness, we are more likely to end up in one 
computable by a short and fast algorithm. A re-examination of split experiment data might 
reveil unexpected, nonobvious, nonlocal algorithmic regularity due to a PRG. 

This prediction runs against current mainstream trends in physics, with the possible 
exception of hidden variable theory, e.g., R, |90|. 



7.5.3 Expected Duration of the Universe 

Given S, the probability that the history of the universe so far will reach into the next phase 
of FAST is at most | — compare Example |6.1| . Does that mean there is a 50 % chance that 
our universe will get at least twice as old as it is now? Not necessarily, if the computation 
of its state at the n-th time step (local time) requires more than 0{n) time. 

As long as there is no compelling contrarian evidence, however, a reasonable guess would 
be that our universe is indeed among the fastest ones with 0(1) output bits per constant time 
interval consumed by algorithm FAST. It may even be "locally" computable through simple 
simulated processors, each interacting with only few neighbouring processors, assuming that 
the pseudorandom aspects of our universe do not require any more global communication 
between spatio-temporally separated parts than the well-known physical laws. Note that the 
fastest universe evolutions include those representable as sequences of substrings of constant 
length /, where each substring stands for the universe's discretized state at a certain discrete 
time step and is computable from the previous substring in 0{l) time (compare Example 



Ol ). However, the fastest universes also include those whose representations of successive 
discrete time steps do grow over time and where more and more time is spent on their 
computation. The expansion of certain computable universes actually requires this. 

In any case, the probability that ours will last 2" times longer than it has lasted so far 
is at most 2~" (except, of course, when its early states are for some reason much harder to 
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compute than later ones and we are still in an early state). This prediction also differs from 
those of current mainstream physics (compare though), but obviously is not verifiable. 



7.6 Short Algorithm Detectable? 

Simple PRG subroutines of the universe may not necessarily be easy to find. For instance, 
the second billion bits of vr's dyadic expansion "look" highly random although they are not, 
because they are computable by a very short algorithm. Another problem with existing data 
may be its potential incompleteness. To exemplify this: it is easy to see the pattern in an 
observed sequence 1, 2, 3, ... , 100. But if many values are missing, resulting in an observed 
subsequence of, say, 7, 19, 54, 57, the pattern will be less obvious. 

A systematic enumeration and execution of all candidate algorithms in the time-optimal 
style of Levin search should find one consistent with the data essentially as quickly as 



possible. Still, currently we do not have an a priori upper bound on the search time. This 
points to a problem of falsifiability. 

Another caveat is that the algorithm computing our universe may somehow be wired up 
to defend itself against the discovery of its simple PRG. According to Heisenberg we cannot 
observe the precise, current state of a single electron, let alone our universe, because our 
actions seem to influence our measurements in a fundamentally unpredictable way. This does 
not rule out a predictable underlying computational process whose deterministic results we 
just cannot access — compare hidden variable theory ||^, |1^, Q . More research, however. 



is necessary to determine to what extent such fundamental undetectability is possible in 
principle from a computational perspective (compare ^7\, |68|). 

For now there is no reason why believers in S should let themselves get discouraged too 
quickly from searching for simple algorithmic regularity in apparently noisy physical events 
such as beta decay and "many world splits" in the spirit of Everett [^|. The potential 



rewards of such a revolutionary discovery would merit significant experimental and analytic 
efforts. 



7.7 Relation to Previous Work on All Possible Universes 

A previous paper on computable universes fT^ already pointed out that computing all uni- 



verses with all possible types of physical laws tends to be much cheaper in terms of informa- 
tion requirements than computing just one particular, arbitrarily chosen one, because there 
is an extremely short algorithm that systematically enumerates and runs all computable 
universes, while most individual universes have very long shortest descriptions. The subset 
embodied by the many worlds of Everett Ill's "many worlds hypothesis" was considered 
a by-product of this more general set-up. 

The previous paper apparently also was the first to apply the theory of inductive inference 
to entire universe histories |7^, Section: Are we Run by a Short Algorithm?], using the 
Solomonoff-Levin distribution to predict that simple universes are more likely; that is, the 
most probable universe is the simplest one compatible with our existence, where simplicity 
is defined in terms of traditional Kolmogorov complexity — compare everything mailing 
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list archive: \http: //www, escribe, corn/science/theory/ml 284 -htm^ ml312.html, as well as 



recent papers by Standisli and Soklakov |86 
somewhat contrarian view. 



|, and see Calude and Meyerstein |^ for a 
The current paper introduces simplicity measures more dominant than the traditional 



ones 



50, 82,11, 26, 100, 52, 54 



55], |3^, |3^, and provides a more general, more 

technical, and more detailed account, incorporating several novel theoretical results based 
on generalizations of Kolmogorov complexity and algorithmic probability. In particular, 
it stretches the notions of computability and constructivism to the limits, by considering 
not only MTM-based traditional computability but also less restrictive GTM-based and 
EOM-based describability, and proves several relevant "Occams razor theorems." Unlike 
the previous paper it also analyzes fundamental time constraints on the computation of 
everything, and derives predictions based on these restrictions. 

Rather than pursuing the computability-oriented path layed out in |]72[, Tegmark recently 
suggested what at first glance seems to be an alternative ensemble of possible universes based 

thus 
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on a (somewhat vaguely defined) set of "self-consistent mathematical structures" 
going beyond his earlier, less general work | 
world variants of our own particular universe 
theses [|60| , |15[. It is not quite clear whether Tegmark would like to include universes that 
are not formally describable according to Def. |2.5| . It is well-known, however, that for any 



on physical constants and Everett's many 
compare also Marchal's and Bostrom's 



set of mathematical axioms there is a program that lists all provable theorems in order of 
the lengths of their shortest proofs encoded as bitstrings. Since the TM that computes all 
bitstrings outputs all these proofs for all possible sets of axioms, Tegmark's view seems in 
a certain sense encompassed by the algorithmic approach |j72|. On the other hand, there are 



many formal axiomatic systems powerful enough to encode all computations of all possible 
TMs, e.g., number theory. In this sense the algorithmic approach is encompassed by number 
theory. 

The algorithmic approach, however, offers several conceptual advantages: (1) It provides 
the appropriate framework for issues of information-theoretic complexity traditionally ig- 
nored in pure mathematics, and imposes natural complexity-based orderings on the possible 
universes and subsets thereof. (2) It taps into a rich source of theoretical insights on com- 
putable probability distributions relevant for establishing priors on possible universes. Such 
priors are needed for making probabilistic predictions concerning our own particular uni- 
verse. Although Tegmark suggests that "... all mathematical structures are a priori given 
equal statistical weight" |§9[(P- 27), there is no way of assigning equal nonvanishing prob- 
ability to all (infinitely many) mathematical structures. Hence we really need something 
like the complexity-based weightings discussed in and especially the paper at hand. (3) 
The algorithmic approach is the obvious framework for questions of temporal complexity 
such as those discussed in this paper, e.g., "what is the most efficient way of simulating all 
universes?" 
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8 Concluding Remarks 



There is an entire spectrum of ways of ordering the describable things, spanned by two 
extreme ways of doing it. Sections |^-^ analyzed one of the extremes, based on minimal 
constructive description size on generalized Turing Machines more expressive than those 
considered in previous work on Kolmogorov complexity and algorithmic probability and 
inductive inference. Section ^ discussed the other extreme based on the fastest way of 
computing all computable things. 

Between the two extremes we find methods for ordering describable things by (a) their 
minimal nonhalting enumerable descriptions (also discussed in Sections @-|^), (b) their min- 
imal halting or monotonic descriptions (this is the traditional theory of Kolmogorov com- 
plexity or algorithmic information), and (c) the polynomial time complexity-oriented criteria 
being subject of most work in theoretical computer science. Theorems in Sections |]-^ reveil 
some of the structure of the computable and enumerable and constructively describable 
things. 

Both extremes of the spectrum as well as some of the intermediate points yield natural 
prior distributions on describable objects. The approximable and cumulatively enumerable 
description size-based priors (Sections suggest algorithmic theories of everything (TOEs) 
partially justifying Occam's razor in a way more general than previous approaches: given 
several explanations of your universe, those requiring few bits of information are much more 
probable than those requiring many bits (Section However, there may not be an effective 
procedure for discovering a compact and complete explanation even if there is one. 

The resource-optimal, less dominant, yet arguably more plausible extreme (Section ^ 
leads to an algorithmic TOE without excessive temporal complexity: no calculation of any 
universe computable in countable time needs to suffer from an essential slow-down due to 
simultaneous computation of all the others. Based on the rather weak assumption that the 
world's creator is constrained by certain limits of computability, and considering that all of 
us may be just accidental by-products of His optimally efficient search for a solution to some 
computational problem, the resulting "speed prior" predicts that a fast and short algorithm 
is responsible not only for the apparently simple laws of physics but even for what most 
physicists currently classify as noise or randomness (Section |^). It may be not all that hard 
to find; we should search for it. 

Much of this paper highlights differences between countable and uncountable sets. It is 
argued (Sections ^ ^ that things such as txncountable time and space and mcomputable 
probabilities actually should not play a role in explaining the world, for lack of evidence 
that they are really necessary. Some may feel tempted to counter this line of reasoning by 
pointing out that for centuries physicists have calculated with continua of real numbers, most 
of them incomputable. Even quantum physicists who are ready to give up the assumption 
of a continuous universe usually do take for granted the existence of continuous probability 
distributions on their discrete universes, and Stephen Hawking explicitly said: "Although 
there have been suggestions that space-time may have a discrete structure I see no reason 
to abandon the continuum theories that have been so successful. " Note, however, that all 
physicists in fact have only manipulated discrete symbols, thus generating finite, describable 
proofs of their results derived from enumerable axioms. That real numbers really exist 
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in a way transcending the finite symbol strings used by everybody may be a figment of 



imagination — compare Brouwer's constructive mathematics [17, m and the Lowenheim- 



Skolem Theorem 153, W^ which imphes that any first order theory with an uncountable 



model such as the real numbers also has a countable model. As Kronecker put it: "Die 
ganze Zahl schuf der Hebe Gott, alles Ubrige ist Menschenwerk" ("God created the integers. 



all else is the work of man" ||2^). Kronecker greeted with scepticism Cantor's celebrated 



insight 1^ about real numbers, mathematical objects Kronecker believed did not even exist. 

A good reason to study algorithmic, noncontinuous, discrete TOEs is that they are the 
simplest ones compatible with everything we know, in the sense that universes that cannot 
even be described formally are obviously less simple than others. In particular, the speed 
prior-based algorithmic TOE (Sections |^, ^ neither requires an uncountable ensemble of 



universes (not even describable in the sense of Def. |6.1| ), nor infinitely many bits to specify 



nondescribable real-valued probabilities or nondescribable infinite random sequences. One 
may believe in the validity of algorithmic TOEs until (a) there is evidence against them, e.g., 
someone shows that our own universe is not formally describable and would not be possible 
without, say, existence of incomputable numbers, or (b) someone comes up with an even 
simpler explanation of everything. But what could that possibly be? 

Philosophers tend to create theories inspired by recent scientific developments. For in- 
stance, Heisenberg's uncertainty principle and Godel's incompleteness theorem greatly in- 
fluenced modern philosophy. Are algorithmic TOEs and the "Great Programmer Religion" 
7^ just another reaction to recent developments, some in hindsight obvious by-product of 



the advent of good virtual reality? Will they soon become obsolete, as so many previous 
philosophies? We find it hard to imagine so, even without a boost to be expected for al- 
gorithmic TOEs in case someone should indeed discover a simple subroutine responsible for 
certain physical events hitherto believed to be irregular. After all, algorithmic theories of 
the describable do encompass everything we will ever be able to talk and write about. Other 
things are simply beyond description. 
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